Soial Network Analysis Based on BSP Clustering Algorithm ong Shool of Business Administration China University of Petroleum ABSRAC Soial network analysis is a new researh field in data mining. he lustering in soial network analysis is different from traditional lustering. It requires grouing obets into lasses based on their links as well as their attributes. While traditional lustering algorithms grou obets only based on obets similarity, and it an't be alied to soial network analysis. So on the basis of BSP (business system lanning) lustering algorithm, a soial network lustering analysis algorithm is roosed. he roosed algorithm, different from traditional lustering algorithms, an grou obets in a soial network into different lasses based on their links and identify relation among lasses. INRODUCION Soial network analysis, whih an be alied to analysis of the struture and the roerty of ersonal relationshi, web age links, and the sread of messages, is a researh field in soiology. Reently soial network analysis has attrated inreasing attention in the data mining researh ommunity. From the viewoint of data mining, a soial network is a heterogeneous and multi-relational dataset reresented by grah (Han & Kamber, 6). Researh on soial network analysis in the data mining ommunity inludes following areas: lustering analysis (Bhattaharya & etoor, 5; Kubia, Moore and Shneider, 3), lassifiation (Lu & etoor, 3), link redition (Liben-Nowell & Kleinberg, 3; Krebs, ). Other ahievements inlude PageRank (Page, Brin, Motwani and Winograd, 998) and Hub-Authority (Kleinberg, 999) in web searh engine. In this aer, lustering analysis of soial network is studied. In the seond setion, a soial network lustering algorithm is roosed based on BSP lustering algorithm. he algorithm an grou obets in a soial network into different lasses based on their links, and it an also identify the relations among lasses. In the third setion, an examle of soial network lustering algorithm is resented, and then the onlusion and the future work diretion are given. SOCIAL NEWORK ANALYSIS BASED ON BSP CLUSERIN here has been extensive researh work on lustering in data mining. raditional lustering algorithms (Han & Kamber, 6) divide obets into lasses based on their similarity. Obets in a lass are similar to eah other and are very dissimilar from obets in different lasses. Soial network lustering analysis, whih is different from traditional lustering roblem, divides obets into lasses based on their links as well as their attributes. he biggest hallenge of soial network lustering analysis is how to divide obets into lasses based on obets links, thus we need find algorithms that an meet this hallenge. he BSP (business system lanning) lustering algorithm (ao, Wu and, ) is roosed by IBM. It designed to define information arhiteture for the firm in business system lanning. his algorithm analyses business roess and their data lasses, luster business roess into sub-systems, and define the relationshi of these sub-systems. Basially BSP lustering algorithm uses obets(business roesses)and links among obets(data lasses)to make lustering analysis. Similarly soial network also inludes obets and links among these obets. In view of the same re-ondition, the BSP lustering algorithm an be used in soial network lustering analysis. Communiations of the IIMA 39 7 Volume 7 Issue 4
Aording to grah theory, soial network is a direted grah omosed by obets and their relationshi. Figure shows a samle of soial network, the irle in the figure reresents an obet; the line with arrow is an edge of the grah, and it reresents direted link between two obets, so a soial network is a direted grah. Figure : A samle of soial network. O O E E E3 E4 O3 E5 E6 O4 E7 E8 E9 E O6 O5 In figure, Let Oi be an obet in soial network ( i... m ), let E whih means direted link between two obets, be a direted edge of the grah (... n ). After definition of obets and direted edges, we an also define reahable relation between two obets. here are two kinds of reahable relation among obets, shown as following: ) One-ste reahable relation: if there has direted link fromo i too through one and only one direted edge, then Oi to O is a one-ste reahable relation. For instane in figure there has a direted link from O too through the direted edge E, O too is one-ste reahable relation. ) Multi-stes reahable relation: if there has direted link from O i to O through two or more direted edges, then O i to O is a multi-stes reahable relation. For instane in figure has a direted link from O too4 through direted edges E and E 5, theno too4 is a -stes reahable relation. After these definitions, we an use BSP lustering algorithm to analyses a soial network. he analysis roesses are as following stes: enerate edge reation matrix and edge ointed matrix First aording to the obets and edges in the grah, define two matrix L and L. Let L be a m n matrix whih means the reation of edges. In the matrix, L ) denotes obetoi onnets with the tail of edge E, whih means that obetoi reates the direted edge E. L ) denotesoi doesn t onnet with the tail of edge E, whih means E isn t reated by obeto. For examle in figure obet O onnets with the tail of E, then it means O reates E, so L (,) ; O doesn t onnet with the tail of edge E, then it means E is not reated byo, so L (,). i Communiations of the IIMA 4 7 Volume 7 Issue 4
Let L be a m n matrix whih means the ointed relations of edges. In the matrix, L ) denotes obet Oi onnets with the head of edge E, whih means obet O i is ointed to by the direted edge E. L ) denotesoi doesn t onnet with the head of edge E, whih means E doesn t oint too i. For examle in figure obet O onnets with the head of E, whih means O is ointed to by E, so L (,). But O doesn t onnet with the head of edge E, then it means E doesn t oint to O, so L,). ( Calulate one-ste reahable matrix between obets After the definition of L and L, we an alulate one-ste reahable matrix between obets through the following equation. n L L gi ( l k) l ( k, )), i,..., m,,..., m k, () is Boolean rodut, is Boolean sum. ) O too is a one-ste reahable relation, ) means i means there hasn t a one-ste reahable relation fromo too. hrough, we an alulate all one-ste reahable relation between obets. i Calulate multi-stes reahable matrix between obets Besides one-ste reahable relation, there are multi-stes reahable relations between obets too. We also need alulate multi-stes reahable matries (-stes, 3-stes,, m--stes). Aording to grah theory and the BSP lustering algorithm, we an alulate multi-stes reahable 3 4 m matrix. Following equations show the alulation of multi-stes reahable matrix:,,,..., m g i, ( g( i, k) g( k, )), i,..., m,,..., m k 3 4 3 m m () hese matries inlude -stes, 3-stes m--stes reahable relations between obets. Now we an know n-stes 3 4 m reahable relation between two obets through,,,...,. Calulate reahable matrix Beause we only onsider whether reahable relations exist between two obets, but do not are these relations are 3 4 m one-ste or multi-stes, so we need alulate reahable matrix R based on,,,,...,. he alulation of R is shown as following equation: m R I... (3) Communiations of the IIMA 4 7 Volume 7 Issue 4
is Boolean sum, I is unit matrix. R ) means reahable relation exists from i O too, but the reahable relations existing in matrix R is not mutual, for instane R ) means reahable relation exists from O i to O,but it doesn t means reahable relation exists fromo too. Mutual reahable relations between two obets are imortant in a soial network, so i we need alulate mutual reahable matrix based on R. Calulate mutual reahable matrix and generate lusters he mutual reahable matrix an be alulated through following alulate equation. Q R R (4) means Boolean rodut In the matrixq ) means there are mutual reahable relation betweeno i ando. In a soial network if two obets that have mutual reahable relation, they should belong to the same lass, thus we an luster based onq. hus aording to mutual reahable matrixq, we an divide a soial network into lasses based on strong submatries inq or adustedq. While strong sub-matrix is defined as follows. Strong sub-matrix: if all elements in a sub-matrix ofq are, this sub matrix is strong sub-matrix. Identify relationshis among lasses After lustering of soial network, we also need identify relationshi among lusters. his an be done through generated lusters and one-ste reahable matrix. If there is one-ste reahable relation between two obets in different lasses, we an say direted links exist between lasses. hrough we an identify all relations among lasses. After ervious 6 stes, we an divide a soial network into lasses. Soial network lustering analysis algorithm an be given: Inut: L : Edge reation Matrix L : Edge ointed matrix Begin L L u for k3 to m do k k m R I... Q R R Q > C k C,Q )->Relation ( C ) ( k End k Communiations of the IIMA 4 7 Volume 7 Issue 4
Communiations of the IIMA 43 7 Volume 7 Issue 4 Q > C k means generating lusters through mutual reahable matrix Q, and ( k C, Q )->Relation( k C ) means identifying relationshis among lusters base on lusters and one-ste reahable matrix. EXAMPLE Now an examle is given to show roess of the luster analysis of soial network. Suose a soial network as figure shows. Aording to the figure, we an give the edge reation matrix L and edge ointed matrix L as following. L L Aording to the soial network lustering algorithm, L and L, lustering the soial network show as following stes: Calulate one-ste reahable matrix between obets Calulate multi-stes reahable matrix between obets 3 3 4 4 5
Communiations of the IIMA 44 7 Volume 7 Issue 4 Calulate reahable matrix based on one-ste and multi-stes reahable matrix... 5 I R Calulate mutual reahable matrix, generate lusters R R Q Aording the mutual reahable matrixq, it inludes two strong sub matries. So we an divide figure to two lasses, the first lass C inludes obet 3,, O O O, and the seond lass C inludes 6 5 4,, O O O. Identify relationshis among lasses Aording to one-ste reahable matrix, there have one-ste reahable relations between to lasses ( 4 O O > and 4 3 O O > ), so we an identify relations between two lusters C and C, as figure shows. Figure : Identify relationshis between two lusters. C oints to C In figure, but C not oints to C, so we an identify relations between two lasses. C O O O 3 C O 4 O 5 O 6
CONCLUSION In this aer based on BSP lustering algorithm, an algorithm of soial network lustering analysis is roosed. It divides a soial network into different lasses aording to obets in the soial network and links between obets, and it also an identify relations among lusters. Main disadvantage of this algorithm is that it uses matries to store edges and reahable relations, in a real soial network these matries will be very huge, an t load into main memory. But beause these matries are very sarse, so we an design an effiient data struture to overome this shortoming. Also in our algorithm the edges between obets have same weight, however in real world suh edges may have different weights. Meanwhile the roerty of eah luster has not been analyzed. these will be solved in our future researh. REFERENCES Bhattaharya I, etoor L.(4). Iterative Reord Linkage for Cleaning and Integration. Proeeding SIMOD 4 worksho on researh issues on data mining and knowledge disovery, Paris, Frane,-8. ao X, Wu S, B. (). Management Information System. Beiing: Eonomy and Management Press (in Chinese). Han J, Kamber M. (6). Data Mining: Conets and ehniques nd edition. San Franiso: he Morgan Kaufmann Publishers. Kleinberg J. ( 999). Authoritative soures in a hyerlinked environment. Journal of the ACM, 5,64 63. Krebs V. (). Maing networks of terrorist ells. Connetions,4,43-5. Kubia J, Moore A, Shneider J. (3). ratable rou Detetion on Large Link Data Sets. Proeeding 3rd IEEE international onferene on data mining, Melbourne, FL,573-576. Liben-Nowell D, Kleinberg J. (3). he Link redition roblem for soial networks. Proeeding 3 international onferene on information and knowledge management, New Orleans, LA,556-559. Lu Q, etoor L. Link-based lassifiation. (3). Proeeding 3 international onferene on mahine learning, Washington DC, 496-53. Page L, Brin S, Motwani R, Winograd. (998). he PageRank itation ranking: Bring order to the web. ehnial reort, Stanford University. Communiations of the IIMA 45 7 Volume 7 Issue 4
Communiations of the IIMA 46 7 Volume 7 Issue 4