Social Network Analysis Based on BSP Clustering Algorithm



Similar documents
A Keyword Filters Method for Spam via Maximum Independent Sets

INTELLIGENCE IN SWITCHED AND PACKET NETWORKS

Hierarchical Clustering and Sampling Techniques for Network Monitoring

Requited Design Review Process

Journal of Manufacturing Systems. Tractable supply chain production planning, modeling nonlinear lead time and quality of service constraints

REGRESSIONS MODELING OF SURFACE ROUGHNESS IN FINISH TURNING OF HARDENED 205Cr115 STEEL USING FACTORIAL DESIGN METHODOLOGY

Strategic Plan. Achieving our 2020 vision. Faculty of Health Sciences

Planning Approximations to the average length of vehicle routing problems with time window constraints

Sebastián Bravo López

Economics 352: Intermediate Microeconomics. Notes and Assignment Chapter 5: Income and Substitution Effects

A ¼ SCALE HYBRID FULL-TRACKED AIR-CUSHION VEHICLE FOR SWAMP PEAT TERRAIN IN MALAYSIA

Weighting Methods in Survey Sampling

Towards fully automated interpretable performance models

BUILDING A SPAM FILTER USING NAÏVE BAYES. CIS 391- Intro to AI 1

NOMCLUST: AN R PACKAGE FOR HIERARCHICAL CLUSTERING OF OBJECTS CHARACTERIZED BY NOMINAL VARIABLES

Channel Assignment Strategies for Cellular Phone Systems

Quantitative analysis of optimal access charge of voice over internet protocol (VoIP)

Correlating Financial Time Series with Micro-Blogging Activity

An Efficient Network Traffic Classification Based on Unknown and Anomaly Flow Detection Mechanism

Open and Extensible Business Process Simulator

Software Ecosystems: From Software Product Management to Software Platform Management

Behavior Analysis-Based Learning Framework for Host Level Intrusion Detection

Picture This: Molecular Maya Puts Life in Life Science Animations

An Enhanced Critical Path Method for Multiple Resource Constraints

Capacity at Unsignalized Two-Stage Priority Intersections

Learning Curves and Stochastic Models for Pricing and Provisioning Cloud Computing Services

Pattern Recognition Techniques in Microarray Data Analysis


CHAPTER J DESIGN OF CONNECTIONS

Discovering Trends in Large Datasets Using Neural Networks

Granular Problem Solving and Software Engineering

Performance Analysis of IEEE in Multi-hop Wireless Networks

Marker Tracking and HMD Calibration for a Video-based Augmented Reality Conferencing System

TRENDS IN EXECUTIVE EDUCATION: TOWARDS A SYSTEMS APPROACH TO EXECUTIVE DEVELOPMENT PLANNING

i e AT 11 of 2006 INSURANCE COMPANIES (AMALGAMATIONS) ACT 2006

The Impact of Digital File Sharing on the Music Industry: A Theoretical and Empirical Analysis

1.3 Complex Numbers; Quadratic Equations in the Complex Number System*

A Comparison of Service Quality between Private and Public Hospitals in Thailand

TECHNOLOGY-ENHANCED LEARNING FOR MUSIC WITH I-MAESTRO FRAMEWORK AND TOOLS

' R ATIONAL. :::~i:. :'.:::::: RETENTION ':: Compliance with the way you work PRODUCT BRIEF

SLA-based Resource Allocation for Software as a Service Provider (SaaS) in Cloud Computing Environments

The Contamination Problem in Utility Regulation

BENEFICIARY CHANGE REQUEST

Supply Chain Management in a Dairy Industry A Case Study

Bandwidth Allocation and Session Scheduling using SIP

cos t sin t sin t cos t

Board Building Recruiting and Developing Effective Board Members for Not-for-Profit Organizations

Efficient Mobile Asset Tracking and Localization in ZigBee Wireless Network

Automatic Search for Correlated Alarms

A Holistic Method for Selecting Web Services in Design of Composite Applications

Strategies for Development and Adoption of ERR in German Ambulatory Care

Impedance Method for Leak Detection in Zigzag Pipelines

Improved SOM-Based High-Dimensional Data Visualization Algorithm

FIRE DETECTION USING AUTONOMOUS AERIAL VEHICLES WITH INFRARED AND VISUAL CAMERAS. J. Ramiro Martínez-de Dios, Luis Merino and Aníbal Ollero

Compressed Domain Automatic Level Control based on ITU-T G.722.2

Neural network-based Load Balancing and Reactive Power Control by Static VAR Compensator

In order to be able to design beams, we need both moments and shears. 1. Moment a) From direct design method or equivalent frame method

Economic and Antitrust Barriers to Entry

How To Fator

Electrician'sMathand BasicElectricalFormulas

Project Management and. Scheduling CHAPTER CONTENTS

The Reduced van der Waals Equation of State

Henley Business School at Univ of Reading. Pre-Experience Postgraduate Programmes Chartered Institute of Personnel and Development (CIPD)

JEFFREY ALLAN ROBBINS. Bachelor of Science. Blacksburg, Virginia

State of Maryland Participation Agreement for Pre-Tax and Roth Retirement Savings Accounts

Scalable and Fault-tolerant Network-on-Chip Design Using the Quartered Recursive Diagonal Torus Topology

Ranking Community Answers by Modeling Question-Answer Relationships via Analogical Reasoning

AUTOMATIC AND CONTINUOUS PROJECTOR DISPLAY SURFACE CALIBRATION USING EVERY-DAY IMAGERY

Mean shift-based clustering

SQUARE GRID POINTS COVERAGED BY CONNECTED SOURCES WITH COVERAGE RADIUS OF ONE ON A TWO-DIMENSIONAL GRID

Journal of Engineering Science and Technology Review 6 (5) (2013) Research Article

Chapter 5 Single Phase Systems

protection p1ann1ng report

REDUCTION FACTOR OF FEEDING LINES THAT HAVE A CABLE AND AN OVERHEAD SECTION

Asymmetric Error Correction and Flash-Memory Rewriting using Polar Codes

Customer Reporting for SaaS Applications. Domain Basics. Managing my Domain

WITH the advent of Internet and wireless communication,

Programming Basics - FORTRAN 77

The PageRank Citation Ranking: Bring Order to the Web

THE UNIVERSITY OF TEXAS AT ARLINGTON COLLEGE OF NURSING. NURS Introduction to Genetics and Genomics SYLLABUS

Active Load Balancing in a Three-Phase Network by Reactive Power Compensation

Subordinating to the Majority: Factoid Question Answering over CQA Sites

A Theoretical Analysis of Credit Card Reform in Australia *

The Transcriber s Art - #40 Richard Yates Pavane, Op. 50 by Gabriel Fauré

1 6 Copper Lane London

GABOR AND WEBER LOCAL DESCRIPTORS PERFORMANCE IN MULTISPECTRAL EARTH OBSERVATION IMAGE DATA ANALYSIS

Job Creation and Job Destruction over the Life Cycle: The Older Workers in the Spotlight

OPTIONS ON NORMAL UNDERLYINGS

Tax-loss Selling and the Turn-of-the-Year Effect: New Evidence from Norway 1

ENFORCING SAFETY PROPERTIES IN WEB APPLICATIONS USING PETRI NETS

Point Location. Preprocess a planar, polygonal subdivision for point location queries. p = (18, 11)

PROCEEDS OF CRIME (BUSINESS IN THE REGULATED SECTOR) ORDER 2015

The Online Freeze-tag Problem

X How to Schedule a Cascade in an Arbitrary Graph

A Context-Aware Preference Database System

Static Fairness Criteria in Telecommunications

5.2 The Master Theorem

Big Data Analysis and Reporting with Decision Tree Induction

Retirement Option Election Form with Partial Lump Sum Payment

An Intelligent E-commerce Recommender System Based on Web Mining

Transcription:

Soial Network Analysis Based on BSP Clustering Algorithm ong Shool of Business Administration China University of Petroleum ABSRAC Soial network analysis is a new researh field in data mining. he lustering in soial network analysis is different from traditional lustering. It requires grouing obets into lasses based on their links as well as their attributes. While traditional lustering algorithms grou obets only based on obets similarity, and it an't be alied to soial network analysis. So on the basis of BSP (business system lanning) lustering algorithm, a soial network lustering analysis algorithm is roosed. he roosed algorithm, different from traditional lustering algorithms, an grou obets in a soial network into different lasses based on their links and identify relation among lasses. INRODUCION Soial network analysis, whih an be alied to analysis of the struture and the roerty of ersonal relationshi, web age links, and the sread of messages, is a researh field in soiology. Reently soial network analysis has attrated inreasing attention in the data mining researh ommunity. From the viewoint of data mining, a soial network is a heterogeneous and multi-relational dataset reresented by grah (Han & Kamber, 6). Researh on soial network analysis in the data mining ommunity inludes following areas: lustering analysis (Bhattaharya & etoor, 5; Kubia, Moore and Shneider, 3), lassifiation (Lu & etoor, 3), link redition (Liben-Nowell & Kleinberg, 3; Krebs, ). Other ahievements inlude PageRank (Page, Brin, Motwani and Winograd, 998) and Hub-Authority (Kleinberg, 999) in web searh engine. In this aer, lustering analysis of soial network is studied. In the seond setion, a soial network lustering algorithm is roosed based on BSP lustering algorithm. he algorithm an grou obets in a soial network into different lasses based on their links, and it an also identify the relations among lasses. In the third setion, an examle of soial network lustering algorithm is resented, and then the onlusion and the future work diretion are given. SOCIAL NEWORK ANALYSIS BASED ON BSP CLUSERIN here has been extensive researh work on lustering in data mining. raditional lustering algorithms (Han & Kamber, 6) divide obets into lasses based on their similarity. Obets in a lass are similar to eah other and are very dissimilar from obets in different lasses. Soial network lustering analysis, whih is different from traditional lustering roblem, divides obets into lasses based on their links as well as their attributes. he biggest hallenge of soial network lustering analysis is how to divide obets into lasses based on obets links, thus we need find algorithms that an meet this hallenge. he BSP (business system lanning) lustering algorithm (ao, Wu and, ) is roosed by IBM. It designed to define information arhiteture for the firm in business system lanning. his algorithm analyses business roess and their data lasses, luster business roess into sub-systems, and define the relationshi of these sub-systems. Basially BSP lustering algorithm uses obets(business roesses)and links among obets(data lasses)to make lustering analysis. Similarly soial network also inludes obets and links among these obets. In view of the same re-ondition, the BSP lustering algorithm an be used in soial network lustering analysis. Communiations of the IIMA 39 7 Volume 7 Issue 4

Aording to grah theory, soial network is a direted grah omosed by obets and their relationshi. Figure shows a samle of soial network, the irle in the figure reresents an obet; the line with arrow is an edge of the grah, and it reresents direted link between two obets, so a soial network is a direted grah. Figure : A samle of soial network. O O E E E3 E4 O3 E5 E6 O4 E7 E8 E9 E O6 O5 In figure, Let Oi be an obet in soial network ( i... m ), let E whih means direted link between two obets, be a direted edge of the grah (... n ). After definition of obets and direted edges, we an also define reahable relation between two obets. here are two kinds of reahable relation among obets, shown as following: ) One-ste reahable relation: if there has direted link fromo i too through one and only one direted edge, then Oi to O is a one-ste reahable relation. For instane in figure there has a direted link from O too through the direted edge E, O too is one-ste reahable relation. ) Multi-stes reahable relation: if there has direted link from O i to O through two or more direted edges, then O i to O is a multi-stes reahable relation. For instane in figure has a direted link from O too4 through direted edges E and E 5, theno too4 is a -stes reahable relation. After these definitions, we an use BSP lustering algorithm to analyses a soial network. he analysis roesses are as following stes: enerate edge reation matrix and edge ointed matrix First aording to the obets and edges in the grah, define two matrix L and L. Let L be a m n matrix whih means the reation of edges. In the matrix, L ) denotes obetoi onnets with the tail of edge E, whih means that obetoi reates the direted edge E. L ) denotesoi doesn t onnet with the tail of edge E, whih means E isn t reated by obeto. For examle in figure obet O onnets with the tail of E, then it means O reates E, so L (,) ; O doesn t onnet with the tail of edge E, then it means E is not reated byo, so L (,). i Communiations of the IIMA 4 7 Volume 7 Issue 4

Let L be a m n matrix whih means the ointed relations of edges. In the matrix, L ) denotes obet Oi onnets with the head of edge E, whih means obet O i is ointed to by the direted edge E. L ) denotesoi doesn t onnet with the head of edge E, whih means E doesn t oint too i. For examle in figure obet O onnets with the head of E, whih means O is ointed to by E, so L (,). But O doesn t onnet with the head of edge E, then it means E doesn t oint to O, so L,). ( Calulate one-ste reahable matrix between obets After the definition of L and L, we an alulate one-ste reahable matrix between obets through the following equation. n L L gi ( l k) l ( k, )), i,..., m,,..., m k, () is Boolean rodut, is Boolean sum. ) O too is a one-ste reahable relation, ) means i means there hasn t a one-ste reahable relation fromo too. hrough, we an alulate all one-ste reahable relation between obets. i Calulate multi-stes reahable matrix between obets Besides one-ste reahable relation, there are multi-stes reahable relations between obets too. We also need alulate multi-stes reahable matries (-stes, 3-stes,, m--stes). Aording to grah theory and the BSP lustering algorithm, we an alulate multi-stes reahable 3 4 m matrix. Following equations show the alulation of multi-stes reahable matrix:,,,..., m g i, ( g( i, k) g( k, )), i,..., m,,..., m k 3 4 3 m m () hese matries inlude -stes, 3-stes m--stes reahable relations between obets. Now we an know n-stes 3 4 m reahable relation between two obets through,,,...,. Calulate reahable matrix Beause we only onsider whether reahable relations exist between two obets, but do not are these relations are 3 4 m one-ste or multi-stes, so we need alulate reahable matrix R based on,,,,...,. he alulation of R is shown as following equation: m R I... (3) Communiations of the IIMA 4 7 Volume 7 Issue 4

is Boolean sum, I is unit matrix. R ) means reahable relation exists from i O too, but the reahable relations existing in matrix R is not mutual, for instane R ) means reahable relation exists from O i to O,but it doesn t means reahable relation exists fromo too. Mutual reahable relations between two obets are imortant in a soial network, so i we need alulate mutual reahable matrix based on R. Calulate mutual reahable matrix and generate lusters he mutual reahable matrix an be alulated through following alulate equation. Q R R (4) means Boolean rodut In the matrixq ) means there are mutual reahable relation betweeno i ando. In a soial network if two obets that have mutual reahable relation, they should belong to the same lass, thus we an luster based onq. hus aording to mutual reahable matrixq, we an divide a soial network into lasses based on strong submatries inq or adustedq. While strong sub-matrix is defined as follows. Strong sub-matrix: if all elements in a sub-matrix ofq are, this sub matrix is strong sub-matrix. Identify relationshis among lasses After lustering of soial network, we also need identify relationshi among lusters. his an be done through generated lusters and one-ste reahable matrix. If there is one-ste reahable relation between two obets in different lasses, we an say direted links exist between lasses. hrough we an identify all relations among lasses. After ervious 6 stes, we an divide a soial network into lasses. Soial network lustering analysis algorithm an be given: Inut: L : Edge reation Matrix L : Edge ointed matrix Begin L L u for k3 to m do k k m R I... Q R R Q > C k C,Q )->Relation ( C ) ( k End k Communiations of the IIMA 4 7 Volume 7 Issue 4

Communiations of the IIMA 43 7 Volume 7 Issue 4 Q > C k means generating lusters through mutual reahable matrix Q, and ( k C, Q )->Relation( k C ) means identifying relationshis among lusters base on lusters and one-ste reahable matrix. EXAMPLE Now an examle is given to show roess of the luster analysis of soial network. Suose a soial network as figure shows. Aording to the figure, we an give the edge reation matrix L and edge ointed matrix L as following. L L Aording to the soial network lustering algorithm, L and L, lustering the soial network show as following stes: Calulate one-ste reahable matrix between obets Calulate multi-stes reahable matrix between obets 3 3 4 4 5

Communiations of the IIMA 44 7 Volume 7 Issue 4 Calulate reahable matrix based on one-ste and multi-stes reahable matrix... 5 I R Calulate mutual reahable matrix, generate lusters R R Q Aording the mutual reahable matrixq, it inludes two strong sub matries. So we an divide figure to two lasses, the first lass C inludes obet 3,, O O O, and the seond lass C inludes 6 5 4,, O O O. Identify relationshis among lasses Aording to one-ste reahable matrix, there have one-ste reahable relations between to lasses ( 4 O O > and 4 3 O O > ), so we an identify relations between two lusters C and C, as figure shows. Figure : Identify relationshis between two lusters. C oints to C In figure, but C not oints to C, so we an identify relations between two lasses. C O O O 3 C O 4 O 5 O 6

CONCLUSION In this aer based on BSP lustering algorithm, an algorithm of soial network lustering analysis is roosed. It divides a soial network into different lasses aording to obets in the soial network and links between obets, and it also an identify relations among lusters. Main disadvantage of this algorithm is that it uses matries to store edges and reahable relations, in a real soial network these matries will be very huge, an t load into main memory. But beause these matries are very sarse, so we an design an effiient data struture to overome this shortoming. Also in our algorithm the edges between obets have same weight, however in real world suh edges may have different weights. Meanwhile the roerty of eah luster has not been analyzed. these will be solved in our future researh. REFERENCES Bhattaharya I, etoor L.(4). Iterative Reord Linkage for Cleaning and Integration. Proeeding SIMOD 4 worksho on researh issues on data mining and knowledge disovery, Paris, Frane,-8. ao X, Wu S, B. (). Management Information System. Beiing: Eonomy and Management Press (in Chinese). Han J, Kamber M. (6). Data Mining: Conets and ehniques nd edition. San Franiso: he Morgan Kaufmann Publishers. Kleinberg J. ( 999). Authoritative soures in a hyerlinked environment. Journal of the ACM, 5,64 63. Krebs V. (). Maing networks of terrorist ells. Connetions,4,43-5. Kubia J, Moore A, Shneider J. (3). ratable rou Detetion on Large Link Data Sets. Proeeding 3rd IEEE international onferene on data mining, Melbourne, FL,573-576. Liben-Nowell D, Kleinberg J. (3). he Link redition roblem for soial networks. Proeeding 3 international onferene on information and knowledge management, New Orleans, LA,556-559. Lu Q, etoor L. Link-based lassifiation. (3). Proeeding 3 international onferene on mahine learning, Washington DC, 496-53. Page L, Brin S, Motwani R, Winograd. (998). he PageRank itation ranking: Bring order to the web. ehnial reort, Stanford University. Communiations of the IIMA 45 7 Volume 7 Issue 4

Communiations of the IIMA 46 7 Volume 7 Issue 4