Visual Analysis of People s Calling Network from CDR data
|
|
|
- Lisa Morgan Jordan
- 10 years ago
- Views:
Transcription
1 Visual Analysis of People s Calling Network from CDR data Category: Research Sloan Business School Media Lab Graduation Student Media Lab Staff Radial tree view of selected hierarchy and groups Media Lab First Year Radial tree view (c) Statistic view (d) Spiral view Figure 1: Four visualizations for analyzing the MIT Reality Mining dataset. depicts the entire calling network, in which each subject is represented as a leaf node with a unique identification. displays the result of interactive hierarchy and group selection based on. (c) shows the calling pattern (in and out calling) of ID 8 with respect to. (d) shows the social connections of ID 11 (shown in yellow) with a spiral view. ABSTRACT Call detail records (CDR) is a widely used data in Social Network Analysis (SNA). This paper introduces a novel visual analysis technique for characterizing one person s calling network and revealing one person s underlying statistical calling information and social communities. We represent the entire calling network with a hierarchical radial kd-tree view, and allow the user to interactively analyze the social groups by selecting the hierarchy and groups of the radial tree. To inspect the calling pattern of one subject, a statistic view is provided with a radar chart representation. In addition, a spiral layout is employed to reveal the closeness relationship between one subject and groups which have direct or indirect calling connections with him or her. The statistic view can be used to compare the calling patterns of different persons, or different time durations of one subject. We demonstrate the effectiveness of our approach with the CDR data of the IEEE VAST 2008 mini challenge, and the MIT Reality Mining dataset. 1 INTRODUCTION Within the last two decades, large amounts of call detail records (CDR) datasets are becoming available. A CDR dataset includes the time, duration, caller and callee, and the locations of the cellular towers. For instance, the CDR dataset from IEEE VAST Challenge 2008 contains about call records from 400 persons for 10 days (from June 1st to June 10th, 2006). It implies a communication network, and provides meaningful information about the characteristics of human behaviors. Many efforts have been put on studying the structure of mobile phone networks [3, 5], revealing personal life pace and reactions to outlier events [3], and classifying different social networks [5] (e.g., friendship, organizations). Further studies have also been conducted on the dynamics of the mobile phone networks [11], favoring the analysis of evolution of relationships over time. Current research on mobile phone networks has greatly enabled the understanding of the mobility pattern and human behavior. Yet, easy access to the characteristics of some subject (caller or callee) is still a challenging problem. Conventional statistical and visualization techniques can definitely help address this problem, but tend to be inefficient due to the following reasons. First, the mobile phone network is of significantly large data size. For example, the MIT Reality Mining project [4] collects data from 100 mobile phones for 9 months, and captures more than 3 million cell phone activities. Second, the mobile phone network contains complex information. Eagle et al. [3] has successfully identified different types of students based on the expansion rates of their mobile phone network. This complex property has made it quite difficult to achieve a comprehensive visualization at limited screen resolutions. Third, the timevarying property of the mobile phone network and the evolution of someone s communication network is difficult to be captured. The primary goal of this work is to analyze the characteristics of one subject s communication network, and enable comparison of subjects from different communities. The main idea takes a clusterand-analyze procedure: prior to detailed analysis with respect to one subject, the entire network is clustered by means of an improved version of the Girvan-Newman algorithm [9]. Specifically, the first stage clusters the dataset into a hierarchical kd-tree, which supports interactive adjustment of level of details. By converting the tree into a hierarchical radial layout, and further flattening it into a circle, the network is mapped into a uniformly divided ring. In the analysis stage, the influence from each group (a patch of the ring) to the subject at some time is represented as an influence spot between the ring center and the cluster center on the ring. Sequentially connecting all influence spots of all clusters yields a polar chart with respect to the subject. The chart is dynamically changed 1
2 along the time line, facilitating effective inspection of the calling network of the subject. Thus, it allows for analyzing not only the evolution of one person s calling network, but also the patterns of connections along the time line, such as the differences between days and nights. We also introduce a novel spiral view to characterize the closeness relationship between one subject and groups which have direct or indirect calling connections with him or her. With the spiral view, comparing large-sized calling networks is made easy. One main advantage of the spiral view is that each unit of the spiral layout is a sub-group of the entire calling network, whose network and calling pattern can be structured well and analyzed simultaneously. We evaluate the effectiveness of our approach with the CDR data of the IEEE VAST 2008 mini challenge, and the MIT Reality Mining dataset. The rest of this paper is organized as follows. Section 2 reviews the related work. Our approach is explained in Section 3. Experimental results and analysis are given in Section 4. Section 5 concludes this paper and highlights the future work. (c) Figure 2: Existing social network visualization techniques. (a-c) Three node-link representations from existing literature: [10], [20], (c) [18]. (d) The treemap representation [6]. 2 RELATED WORK A social network, such as the mobile phone network, is a widely used term to describe the social activities. Designed for effective analysis and visualization, the node-link representation has a long history dated back to 1930s [7]. The node-link layout, in which the nodes represent the members of the network and the edges represent the relationship, has the advantage of intuitiveness and intelligibility. Typically, hierarchical structure is employed to provide a level-of-details view of the underlying network [12]. Recent work [10, 14, 18, 19, 20] greatly improves the efficiency and refines the layout. One kernel step of these approaches is to employ a well-defined clustering algorithm to group members into identifiable communities. Most of them pay equal attention to all connections, and thus can hardly emphasize on special characteristics. For large-sized datasets with increased connectivity, over-cluttered (d) results may be produced for limited screen resolutions, as shown in Figure 2 (a-c). The adjacency matrix-based views [8] are capable of showing large-sized social network. However, an adjacency matrix provides very limited degree of freedom to interact with the underlying data, making it difficult to perform clustering operations. Alternatively, the treemap representation [6] (see Figure 2(d)) employs embedded rectangular shapes to represent hierarchical and categorical information. It is believed that the treemap layout is significantly suited for representing large datasets with full utilization of screen space. However, using the embedded rectangular representation can hardly encode the hierarchical relationship between different groups. Another common representation for social networks is the radar chart or the spiderweb chart [2]. It is capable of highlighting the most dominant elements and quantifying the proximity between two members. We propose a refined radial layout that is suitable for analyzing the mobile phone network. With our approach, a radar chart can be constructed with respect to the person of interest, and the influences from his/her communication network can be effectively quantified by a dynamically changed polar chart. A challenging task for visualizing a social network is to depict its time-varying properties, and conduct useful insights. The time line is a natural way to represent time-varying data [13] by building a one-to-one correspondence between the representation element and the data primitive. Recently, Bak et al. [1] present a framework, called Growth Ring Maps, to analyze the spatiotemporal data of sensor logs. This representation makes it possible for users to find similarities and extract patterns of interest in spatiotemporal data. Moody et al. [15] introduce the animated movie representation for the changing social network. Our approach leverages the interactions to show the evolution of a communicated network and the influence of each connection on one person. 3 APPROACH We seek to design an effective visualization that allows for interactive analysis of the behavior of a communication network. Our approach emphasizes on the connections between one subject and its direct or indirect social groups based on the assumption that an individual s life style is a combination of relationships with related social groups. The social groups are detected using an improved version of the Girvan-Newman network clustering algorithm [9], and are formed into a hierarchical radial kd-tree. Subsequently, the tree is recursively mapped into a uniformly subdivided ring. For a selected person, a radar chart is generated to characterize the influence from related groups by embedding the influence of every group on its polar coordinates. The definition of influence is various under different situations. The pipeline of our approach is shown in Figure Hierarchical Clustering Existing approaches [10, 12, 14, 18] are mostly based on the nodelink layout, suffering from a problem of possible cluttered visualization for large-sized data. A social network is a typical case of the network with community structure, in which the nodes within the same group have dense node-node connections, and the edges between communities are less dense. Based on this feature, the Girvan-Newman algorithm detects communities in complex systems. It defines the edge betweenness for each edge as the number of the shortest paths between pairs of nodes. Then, it searches the edge with the biggest betweenness value and removes it. Recursively applying this operation yields a binary tree whose leaves are nodes of the network. All clustering results during the process are stored as a list for further reference. We transform the original CDR network into a weighted simple graph, which can be computed as a multiple graph by Girvan-Newman algorithm [16]. However, this algorithm 2
3 Phone ID: 102 A Network from CDR Data Hierarchical Clustering Construting the radial tree From To Date time Duration CDR of POI (133) group1 group2 group3 group4 group5 group6 group7 Computing Closeness Values with respect to groups Phone ID: 102 Adjusting the radial tree Spiral View Statistic View calling Time listening Time Figure 3: The pipeline of our approach. is time-consuming with a time complexity of O(kmn), where k is the number of removed edges, m is the number of the edges in the network, and n is the number of nodes. In order to reduce the computation time, we compute the edge betweenness in three stages. First, the multiple graph is transformed into a simple graph by discarding the extra edges if there are multiple edges in the node pairs. Thus we get the edge betweenness by means of the breadth first search. Second, the edge betweenness for every edge in the multiple graph is increased by the edge number per node pair in the path. Third, we remove all edges whose edge betweenesses are maximum instead of removing only one edge in the traditional way. The modified algorithm has a time complexity of O(k m n), where m is the edge number in the simple graph, and k is the iteration times. 3.2 Generating the radial layout After clustering, each detected group is treated independently as an axis to measure the influence to one subject. The influence can be a variable with statistical or evolutionary properties, and change over time. We choose the radial layout because it saves space, and is very suitable for showing the large nodes compared with the one-way layout, such as the traditional tree layout. Layouts like 2D forcedirected layouts make the space as compact as possible. However, it makes the user confused if there are too many nodes. To allow for effective analysis and visualization of the large-sized tree (network), we construct a radial layout for the tree in three steps. First, all leaf nodes of the tree are sequentially and uniformly distributed on a ring. Each child node corresponds to an interval of the ring. By recursively merging the child nodes, the tree is reformulated into a radial tree view. With this procedure, the ring is subdivided into a hierarchical structure. The length of each ring patch encodes the size of clustered groups The root node is the center of the ring. When a user expects to see what his social groups are like, he or she needs to traverse from the root node of the radial tree. The closer a group is to the root node, the more obvious difference these groups have. 3.3 Generating the spiral view Given that a user cares about who are the closest friends, and who are the second closest friends, the radial layout in 3.2 only shows the overview for all subjects. However, this overview is not intuitive, and the user needs extra time to explore. We introduce a spiral view, in which a subject is placed in the center, and directly or indirectly connected groups are lined into a spiral line according to their social relationship with the subject. In the hierarchical tree produced by the Girvan-Newman algorithm, the closer two persons are, the nearer their hierarchical positions are. Thus, the spiral line could be formed directly from the radial tree. The leaf nodes in the brother branch of the center person are the closest, and these brother branches are cut into form a community, and placed in the first place of the spiral line. The next community in the second place of the spiral line is the brother branches of the center person s father node, and so on. 3.4 Statistic view Better understanding of the social behaviors for a particular group is one of the major goals of the mobile phone network analysis. By analyzing trends in the duration and frequency of calls, it is likely to get information about the characteristics of human behavior patterns [3, 5]. For example, the approach proposed in [3] can be used to compare the communication behaviors between residents and urban communities. Figure 4: Statistical representation view comparison using the same data and same closeness value model: Polyline view; Fan view. The closeness value with blue group and orange are not displayed in the left picture, while obviously in the right. Our approach favors effective analysis and visualization of the duration of calls for each group by using the radar charts. Because From 3.2, we can conclude that one person s close contractors are 3
4 always located close to this person, which makes the measured information easy to tell in a radar chart without a vision loss, because all the information close to each other. Along each axis, we place a point according to the duration. For visualization, one solution to construct a polyline by connecting all the points in clockwise ordered directions. However, this representation would cause confusing results as part of the groups might have no connection. For instance, in Figure 4, the group in blue has a long period of connection and its two neighboring groups have no connection with the subject. Connecting the points on the axes of the three groups makes the duration of the blue group invisible. We choose to represent the influence from calling network with the radar chart representation. Instead of using the connected lines, we use a fan view to represent the duration (Figure 4 ). The set of groups yields several separate fans around the center. When we choose the fan with the radial tree view, the corresponding branch of the hierarchical tree expands to new sub-fans. Calling network usually takes a two-way connection form, i.e., in and out calls. To distinguish them, we decompose each fan into two parts, of which each part represents one way calling in different colors. 3.5 Interactive community classification With the proposed approach, we can easily classify the network data into different communities, identify persons with similar life styles, and compare the associated communication networks of different persons. Generally speaking, people could have a fuzzy classification about his social groups. And this prior social experience could be applied into the CDR datasets. Imaging a scene that a company manager want to know about his communication time with all his clients and his relatives, instead of every client and every relative. On the hierarchical tree view, groups can be divided or merged by adjusting the tree. Interactive manipulation of a radial tree view enables the user to access certain desired group and adjust shown levels of the binary tree. The user can expand or collapse the current tree nodes by simple interaction. Different colors are used to depict different subjects and groups. 4 RESULTS AND ANALYSIS All results were generated on a PC with an Intel 4 Core 2.66 GHZ, 4G RAM, and the operate system is the 32-bit Microsoft windows7. The interface and visual analysis system was implemented with the Processing language [17]. To demonstrate the utility of our system, we conducted case studies on two CDR datasets. One is from MIT Media Lab, by conducting an Reality Mining experiment on one hundred subjects. In these subjects, seventy-five are either students or faculty in MIT Media Laboratory, and twenty-five are incoming students at MIT Sloan business school. The other is the benchmark for the IEEE VAST 2008 challenge, generated by recording the phones calls from Isla Del Sue n over a ten-day period in June Each record in these two CDR dataset contains at least four fields of the calls: the two phone numbers (in and out), the time, and the duration. 4.1 MIT Reality Mining Dataset The MIT Reality Mining dataset records the call logs, Bluetooth devices in proximity of approximately five meters, cell tower IDs, application usage, and phone status of 94 subjects from 23rd, March 2004 to 14th, June After the cleaning process, we obtain a dataset that contains social activities of 83 subjects and 3853 call records from August 2004 to March To reduce the computation time of the Girvan-Newman algorithm, we adopt the call minutes between pairs of people as the weight added to the CDR call network, including in call time and out call time. This is reasonable because the call minutes reflects the degree of call communication. After performing this algorithm, we can get the overview radial tree shown as in Figure 1. Then, the user can choose a group of interest by traversing the tree. In this traversing process, the social community distribution can be quickly observed from the root node, where the deeper the degree of the group node is, the more concrete the social relationship is. From the biggest tree in Figure 1, we can find that it has two branches from the root. That means that these people can be divided into two groups: one is all the people from the Sloan Business School in MIT, and the other is mostly the Media Lab graduated students, among which, ID 78 and ID 40 are the two of the four Media Lab First Year Graduate students. This finding is confirmed in the user identification survey. In this tree, the biggest group is represented as the ring in in the right up corner. Considering a user may know well about how many groups his or her social network could be divided into, the interactive hierarchical layout can be quickly obtained. The user then can choose the initial clusters from the hierarchal layout which are shown as the rings. Figure 5 is created by a user who want to look at the group community at the level of Sloan Business School and Media Lab. If the user wants to look at the level of concrete Media Lab group hierarchy, he/she can first select its corresponding clusters and then unfold its hierarchy tree shown as Figure 5. When there are too many people in a person s social network, the directed social network observation will put too much heavy overload in human s observation. Fortunately, the proposed spiral view can help the user to easily compare their social relationship with multiple friends. Figure 6 shows the social groups with ID 11. People who have relationship could be grouped into 14 communities according to their close relationship with ID 11. The most closest community includes ten people (ID 4,5,8,12,13,31,23,60,102, and 104) where the social relationship among themselves can be observed by the radial tree layout. In the similar manner, we can find the second closest community which only has ID 106. Base on interactively determining the level of the social communities, a radial statistical chart can be created to reveal the underlying patterns. After selecting the node with ID 8, Figure 7 shows the ratio of in call time (colored in orange) and out call time (colored in green) with every person in his social network from March, 2004 to June, 2005, while his/her ratio of total call time in the same time duration is shown in Figure 7. Form this result, we can see that ID 8 and his/her call friends both like to call each other. Moreover, the persons near ID 8 have more call time, which means that their social relationship is much closer. 4.2 IEEE VAST Challenge Dataset Compared with the MIT Reality Mining dataset, the CDR dataset from IEEE VAST Challenge is involved with up to 400 persons. Here, we focus on how to reveal the statistic information. The radial tree view in Figure 8 clearly reveals the ID 2 s social communities in the compact social network. From this result, we can find two groups of people connected with ID 2. One has far more calls in daytime than at nights while the other is vice versa. We guess the former group is his/her work community, and the other is his/her private friends. Comparison of different people s social network is a very important task for analyzing the social behaviors, but it is very difficult to achieve due to its high-dimensional nature. In contrast, our spiral layout intuitively reveals each people s network. With a side-byside comparison of spiral views, the differences between different people s network can be quickly identified. Figure 9 shows the social network of of these two persons (ID 2 and ID 5) which have similar similar connected ID in day and night. By comparing other persons network which are belong to the same level, we can find the people with the similar level probably belongs to the same group. In this manner, we find the mit Reality Mining Dataset also has this pattern. 4
5 Sloan Business S chool Sloan Business School Other paricipiants, such as the lab staff Media Lab Graduate Students Figure 5: The interactive tree layout of the MIT Reality Mining dataset. The mainly three groups; The concrete Media lab s people social relationships. 5 CONCLUSION In this paper we have presented a visual analytics system for CDR data. It focuses on the calling network of one subject, which can be regarded as an analysis technique for the relationship of an individual and the social groups. We suggest that the variation of an individual s social position is a combination of changing closeness value between him/her and each social group related to him. We treat these groups as coordinates in a radar chart, and improve it by mapping the size of each group to the length of arc. By using different views, hierarchies, the user can interactively investigate the calling network in a directed or un-directed fashion. Two case studies were carried out on two datasets, demonstrating that our approach is very promising for social network analysis concerning one subject. In the future we expect to explore personal-related sub-graph detection and visualization method. Instead of using the entire network, analysis will focus on one subject s related groups in most application situations. We plan to use clustering algorithms to handle overlapped groups to get a better social group model. We will also test our approach on other kinds of small-world networks. REFERENCES [1] P. Bak, F. Mansmann, H. Janetzko, and D. Keim. Spatiotemporal analysis of sensor logs using growth ring maps. IEEE Transactions on Visualization and Computer Graphics, 15(6): , [2] J. M. Chambers, W. S. Cleveland, and P. A. Tukey. Graphical methods for data analysis. Duxbury Press, [3] N. Eagle. Behavioral inference across cultures: Using telephones as a cultural lens. IEEE Intelligent Systems, 23(4):62 64, [4] N. Eagle and A. S. Pentland. Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 10(4): , [5] N. Eagle, A. S. Pentland, and D. Lazer. Inferring friendship network structure by using mobile phone data. Proceedings of the National Academy of Sciences, 106(36): , [6] T. L. Frantz and K. M. Carley. Treemaps as a tool for social network analysis, September [7] L. C. Freeman. Visualizing social networks. Journal of Social Structure, 1, [8] M. Ghoniem, J.-D. Fekete, and P. Castagliota. A comparison of the readability of graphs using node-link and matrix-based representations. In INFOVIS 04: Proceedings of the IEEE Symposium on Information Visualization, pages 17 24, [9] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences of the United States of America, 99(12): , [10] J. Heer and D. Boyd. Vizster: Visualizing online social networks. In INFOVIS 05: Proceedings of the Proceedings of the 2005 IEEE Symposium on Information Visualization, page 5, [11] C. A. Hidalgo and C. Rodriguez-Sickert. The dynamics of a mobile phone network. Physica A, 387(12): , [12] D. Holten. Hierarchical edge bundles: Visualization of adjacency relations in hierarchical data. IEEE Transactions on Visualization and Computer Graphics, 12(5): , [13] G. M. Karam. Visualization using timelines. In ISSTA 94: Proceedings of the 1994 ACM SIGSOFT international symposium on Software testing and analysis, pages ACM, [14] P. Mika. Flink: Semantic web technology for the extraction and analysis of social networks. Journal of Web Semantics, 3: , May [15] J. Moody, D. McFarland, and S. Bender-deMoll. Dynamic network visualization. American Journal of Sociology, 110(4): , [16] M. Newman. Analysis of weighted networks. Phys. Rev. E, 70(7):056131, [17] C. Reas, B. Fry, and J. Maeda. Processing: A Programming Handbook for Visual Designers and Artists. The MIT Press, [18] Z. Shen, K. liu Ma, and T. Eliassi-Rad. Visual analysis of large heterogeneous social networks by semantic and structural abstraction. IEEE Transactions on Visualization and Computer Graphics, 12(6): , [19] Z. Shen and K.-L. Ma. Mobivis: A visualization system for exploring mobile data. In Proceedings of IEEE Pacific Visualization Symposium, pages IEEE VGTC, [20] Q. Ye, T. Zhu, D. Hu, B. Wu, N. Du, and B. Wang. Cell phone mini challenge award: Social network accuracy exploring temporal communication in mobile call graphs. IEEE International Symposium on Visual Analytics Science and Technology, pages ,
6 Figure 6: The spiral view of the MIT Reality Mining dataset. The constructed tree; The spiral social relationship of the person whose id is 8. Figure 7: The radar tree chart of the ID 8 from the MIT Reality Mining dataset. The ratio of in call time (colored in orange) and out call time (colored in green) with every person in his social network. The ratio of total call time in the same time duration. The number in the exterior circles represents the concrete call time. 6
7 1 5 and Figure 8: The radial tree view from the VAST Challenge dataset. The overview of the whole dataset. The radar tree chart of the ID 2, where the orange color representing his (or her)ratio of call times with every person from 8:00am to 18:29pm, and the blue color representing his (or her)ratio of call times with every person from 18:30pm to 7:29am. Figure 9: Comparison of the spiral social relationships of the ID 2 and ID 5. 7
Hierarchical Data Visualization
Hierarchical Data Visualization 1 Hierarchical Data Hierarchical data emphasize the subordinate or membership relations between data items. Organizational Chart Classifications / Taxonomies (Species and
MobiVis: A Visualization System for Exploring Mobile Data
MobiVis: A Visualization System for Exploring Mobile Data Zeqian Shen Kwan-Liu Ma Visualization & Interface Design Innovation (VIDi) University of California, Davis ABSTRACT The widespread use of mobile
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering
Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong
VisCG: Creating an Eclipse Call Graph Visualization Plug-in. Kenta Hasui, Undergraduate Student at Vassar College Class of 2015
VisCG: Creating an Eclipse Call Graph Visualization Plug-in Kenta Hasui, Undergraduate Student at Vassar College Class of 2015 Abstract Call graphs are a useful tool for understanding software; however,
Graph/Network Visualization
Graph/Network Visualization Data model: graph structures (relations, knowledge) and networks. Applications: Telecommunication systems, Internet and WWW, Retailers distribution networks knowledge representation
Interactive information visualization in a conference location
Interactive information visualization in a conference location Maria Chiara Caschera, Fernando Ferri, Patrizia Grifoni Istituto di Ricerche sulla Popolazione e Politiche Sociali, CNR, Via Nizza 128, 00198
VISUALIZING HIERARCHICAL DATA. Graham Wills SPSS Inc., http://willsfamily.org/gwills
VISUALIZING HIERARCHICAL DATA Graham Wills SPSS Inc., http://willsfamily.org/gwills SYNONYMS Hierarchical Graph Layout, Visualizing Trees, Tree Drawing, Information Visualization on Hierarchies; Hierarchical
Hierarchy and Tree Visualization
Hierarchy and Tree Visualization Definition Hierarchies An ordering of groups in which larger groups encompass sets of smaller groups. Data repository in which cases are related to subcases Hierarchical
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
Data Mining. Cluster Analysis: Advanced Concepts and Algorithms
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based
HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data
P a g e 77 Vol. 9 Issue 5 (Ver 2.0), January 2010 Global Journal of Computer Science and Technology HierarchyMap: A Novel Approach to Treemap Visualization of Hierarchical Data Abstract- The HierarchyMap
Clustering & Visualization
Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University [email protected] [email protected] ABSTRACT Depending
Visualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Visualizing e-government Portal and Its Performance in WEBVS
Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR [email protected] Abstract An e-government
A Visualization Technique for Monitoring of Network Flow Data
A Visualization Technique for Monitoring of Network Flow Data Manami KIKUCHI Ochanomizu University Graduate School of Humanitics and Sciences Otsuka 2-1-1, Bunkyo-ku, Tokyo, JAPAPN [email protected]
Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
Interactive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,[email protected]) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
Behavioral Entropy of a Cellular Phone User
Behavioral Entropy of a Cellular Phone User Santi Phithakkitnukoon 1, Husain Husna, and Ram Dantu 3 1 [email protected], Department of Comp. Sci. & Eng., University of North Texas [email protected], Department
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics
Complex Network Visualization based on Voronoi Diagram and Smoothed-particle Hydrodynamics Zhao Wenbin 1, Zhao Zhengxu 2 1 School of Instrument Science and Engineering, Southeast University, Nanjing, Jiangsu
Exploration and Visualization of Post-Market Data
Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson
Voronoi Treemaps in D3
Voronoi Treemaps in D3 Peter Henry University of Washington [email protected] Paul Vines University of Washington [email protected] ABSTRACT Voronoi treemaps are an alternative to traditional rectangular
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups
An approach of detecting structure emergence of regional complex network of entrepreneurs: simulation experiment of college student start-ups Abstract Yan Shen 1, Bao Wu 2* 3 1 Hangzhou Normal University,
How To Find Local Affinity Patterns In Big Data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
Hierarchical Data Visualization. Ai Nakatani IAT 814 February 21, 2007
Hierarchical Data Visualization Ai Nakatani IAT 814 February 21, 2007 Introduction Hierarchical Data Directory structure Genealogy trees Biological taxonomy Business structure Project structure Challenges
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence
DICON: Visual Cluster Analysis in Support of Clinical Decision Intelligence Abstract David Gotz, PhD 1, Jimeng Sun, PhD 1, Nan Cao, MS 2, Shahram Ebadollahi, PhD 1 1 IBM T.J. Watson Research Center, New
A Short Introduction on Data Visualization. Guoning Chen
A Short Introduction on Data Visualization Guoning Chen Data is generated everywhere and everyday Age of Big Data Data in ever increasing sizes need an effective way to understand them History of Visualization
Social Network Discovery based on Sensitivity Analysis
Social Network Discovery based on Sensitivity Analysis Tarik Crnovrsanin, Carlos D. Correa and Kwan-Liu Ma Department of Computer Science University of California, Davis [email protected], {correac,ma}@cs.ucdavis.edu
Big Data in Pictures: Data Visualization
Big Data in Pictures: Data Visualization Huamin Qu Hong Kong University of Science and Technology What is data visualization? Data visualization is the creation and study of the visual representation of
Visualizing Repertory Grid Data for Formative Assessment
Visualizing Repertory Grid Data for Formative Assessment Kostas Pantazos 1, Ravi Vatrapu 1, 2 and Abid Hussain 1 1 Computational Social Science Laboratory (CSSL) Department of IT Management, Copenhagen
Visualizing Web Navigation Data with Polygon Graphs
Visualizing Web Navigation Data with Polygon Graphs Jiyang Chen, Tong Zheng, William Thorne, Daniel Huntley, Osmar R. Zaïane and Randy Goebel Department of Computing Science University of Alberta, Edmonton,
GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING
SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING WELCOME TO SAS VISUAL ANALYTICS SAS Visual Analytics is a high-performance, in-memory solution for exploring massive amounts
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
A comparative study of social network analysis tools
Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises
Clustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Topic Maps Visualization
Topic Maps Visualization Bénédicte Le Grand, Laboratoire d'informatique de Paris 6 Introduction Topic maps provide a bridge between the domains of knowledge representation and information management. Topics
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
Statistical Models in Data Mining
Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of
Temporal Visualization and Analysis of Social Networks
Temporal Visualization and Analysis of Social Networks Peter A. Gloor*, Rob Laubacher MIT {pgloor,rjl}@mit.edu Yan Zhao, Scott B.C. Dynes *Dartmouth {yan.zhao,sdynes}@dartmouth.edu Abstract This paper
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS Data Mining Technology for Efficient Network Security Management Ankit Naik [1], S.W. Ahmad [2] Student [1], Assistant Professor [2] Department of Computer Science and Engineering
Unit 4 DECISION ANALYSIS. Lesson 37. Decision Theory and Decision Trees. Learning objectives:
Unit 4 DECISION ANALYSIS Lesson 37 Learning objectives: To learn how to use decision trees. To structure complex decision making problems. To analyze the above problems. To find out limitations & advantages
Cluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
Cluster Analysis for Evaluating Trading Strategies 1
CONTRIBUTORS Jeff Bacidore Managing Director, Head of Algorithmic Trading, ITG, Inc. [email protected] +1.212.588.4327 Kathryn Berkow Quantitative Analyst, Algorithmic Trading, ITG, Inc. [email protected]
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration
An Interactive Visualization Tool for the Analysis of Multi-Objective Embedded Systems Design Space Exploration Toktam Taghavi, Andy D. Pimentel Computer Systems Architecture Group, Informatics Institute
Group CRM: a New Telecom CRM Framework from Social Network Perspective
Group CRM: a New Telecom CRM Framework from Social Network Perspective Bin Wu Beijing University of Posts and Telecommunications Beijing, China [email protected] Qi Ye Beijing University of Posts and Telecommunications
Clustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin [email protected] [email protected] [email protected] Introduction: Data mining is the science of extracting
Dmitri Krioukov CAIDA/UCSD
Hyperbolic geometry of complex networks Dmitri Krioukov CAIDA/UCSD [email protected] F. Papadopoulos, M. Boguñá, A. Vahdat, and kc claffy Complex networks Technological Internet Transportation Power grid
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Final Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
Visual Data Mining with Pixel-oriented Visualization Techniques
Visual Data Mining with Pixel-oriented Visualization Techniques Mihael Ankerst The Boeing Company P.O. Box 3707 MC 7L-70, Seattle, WA 98124 [email protected] Abstract Pixel-oriented visualization
ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010
ProM 6 Exercises J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl August 2010 The exercises provided in this section are meant to become more familiar with ProM
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Fast Matching of Binary Features
Fast Matching of Binary Features Marius Muja and David G. Lowe Laboratory for Computational Intelligence University of British Columbia, Vancouver, Canada {mariusm,lowe}@cs.ubc.ca Abstract There has been
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
CS171 Visualization. The Visualization Alphabet: Marks and Channels. Alexander Lex [email protected]. [xkcd]
CS171 Visualization Alexander Lex [email protected] The Visualization Alphabet: Marks and Channels [xkcd] This Week Thursday: Task Abstraction, Validation Homework 1 due on Friday! Any more problems
Visualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
IC05 Introduction on Networks &Visualization Nov. 2009. <[email protected]>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
An example. Visualization? An example. Scientific Visualization. This talk. Information Visualization & Visual Analytics. 30 items, 30 x 3 values
Information Visualization & Visual Analytics Jack van Wijk Technische Universiteit Eindhoven An example y 30 items, 30 x 3 values I-science for Astronomy, October 13-17, 2008 Lorentz center, Leiden x An
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework
An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework Jakrarin Therdphapiyanak Dept. of Computer Engineering Chulalongkorn University
TOP-DOWN DATA ANALYSIS WITH TREEMAPS
TOP-DOWN DATA ANALYSIS WITH TREEMAPS Martijn Tennekes, Edwin de Jonge Statistics Netherlands (CBS), P.0.Box 4481, 6401 CZ Heerlen, The Netherlands [email protected], [email protected] Keywords: Abstract:
Segmentation of building models from dense 3D point-clouds
Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute
Visualization of Software
Visualization of Software Jack van Wijk Plenary Meeting SPIder Den Bosch, March 30, 2010 Overview Software Vis Examples Hierarchies Networks Evolution Visual Analytics Application data Visualization images
BIG DATA VISUALIZATION. Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou
BIG DATA VISUALIZATION Team Impossible Peter Vilim, Sruthi Mayuram Krithivasan, Matt Burrough, and Ismini Lourentzou Let s begin with a story Let s explore Yahoo s data! Dora the Data Explorer has a new
A SIMULATOR FOR LOAD BALANCING ANALYSIS IN DISTRIBUTED SYSTEMS
Mihai Horia Zaharia, Florin Leon, Dan Galea (3) A Simulator for Load Balancing Analysis in Distributed Systems in A. Valachi, D. Galea, A. M. Florea, M. Craus (eds.) - Tehnologii informationale, Editura
International Journal of Software and Web Sciences (IJSWS) www.iasir.net
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International
BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data
BiCluster Viewer: A Visualization Tool for Analyzing Gene Expression Data Julian Heinrich, Robert Seifert, Michael Burch, Daniel Weiskopf VISUS, University of Stuttgart Abstract. Exploring data sets by
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Visualization Quick Guide
Visualization Quick Guide A best practice guide to help you find the right visualization for your data WHAT IS DOMO? Domo is a new form of business intelligence (BI) unlike anything before an executive
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Business Intelligence and Process Modelling
Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 2: Business Intelligence & Visual Analytics BIPM Lecture 2: Business Intelligence & Visual Analytics 1 / 72 Business Intelligence
Introduction of Information Visualization and Visual Analytics. Chapter 7. Trees and Graphs Visualization
Introduction of Information Visualization and Visual Analytics Chapter 7 Trees and Graphs Visualization Overview! Motivation! Trees Visualization! Graphs Visualization 1 Motivation! Often datasets contain
Classification algorithm in Data mining: An Overview
Classification algorithm in Data mining: An Overview S.Neelamegam #1, Dr.E.Ramaraj *2 #1 M.phil Scholar, Department of Computer Science and Engineering, Alagappa University, Karaikudi. *2 Professor, Department
Visualizing Large Graphs with Compound-Fisheye Views and Treemaps
Visualizing Large Graphs with Compound-Fisheye Views and Treemaps James Abello 1, Stephen G. Kobourov 2, and Roman Yusufov 2 1 DIMACS Center Rutgers University {abello}@dimacs.rutgers.edu 2 Department
Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price
Biology 1 Exercise 1: How to Record and Present Your Data Graphically Using Excel Dr. Chris Paradise, edited by Steven J. Price Introduction In this world of high technology and information overload scientists
