The Impact of Social Affinity on Phone Calling Patterns: Categorizing Social Ties from Call Data Records
|
|
- Matthew Chase
- 8 years ago
- Views:
Transcription
1 The Impact of Social Affinity on Phone Calling Patterns: Categorizing Social Ties from Call Data Records Sara Motahari Sprint Labs Burlingame, CA Sandeep Appala Carnegie Mellon University Moffett Field, CA Ole J. Mengshoel Carnegie Mellon University Moffett Field, CA Luca Zoia Carnegie Mellon University Moffett Field, CA Phyllis Reuther Sprint Labs Burlingame, CA Jay Shah Carnegie Mellon University Moffett Field, CA ABSTRACT Social ties defined by phone calls made between people can be grouped to various affinity networks, such as family members, utility network, friends, coworkers, etc. An understanding of call behavior within each social affinity network and the ability to infer the type of a social tie from call patterns is invaluable for various industrial purposes. For example, the telecom industry can use such information for consumer retention, targeted advertising, and customized services. In this paper, we analyze the patterns of 4.3 million phone call data records produced by 360,000 subscribers from two California cities. Our findings can be summarized as follows. We reveal significant differences among different affinity networks in terms of different call attributes. For example, members within the family network generate the highest average number of calls. Despite the differences between the two cities, for a given affinity network they show similar phone call behaviors. We identify specific features that model statistically meaningful changes in call patterns and can be used for prediction and classification of affinity networks, and we also find correlations between the features associated with call behavior. For example, when subscribers call each other after a long time, their calls tend to take longer. This knowledge leads to discussions of proper machine learning classification approaches as well as promising applications in telecom and security. Categories and Subject Descriptors G.3 [Probability and Statistics]: Statistical Computing; J.4 [Social and Behavioral Sciences]: Sociology. General Terms Measurement, Experimentation. Keywords Call Data, Social Networks, Call Behavior, Social Tie Inference Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. The 6th SNA-KDD Workshop 12 (SNA-KDD 12), August 12, 2012, Beijing, China. Copyright 2012 ACM $ INTRODUCTION Today s pervasive use of mobile phones produces huge volumes of call data records that can potentially provide significant amount of valuable information about various patterns of human interaction, social relations, mobility, etc. In particular, being able to understand, distinguish and discover social affinities and their nature from people s call data records are invaluable in many practical areas, such as planning advanced marketing strategies, dynamic pricing, customized services, and recommendation services. Telecom providers can use this information for customer retention and targeted advertising. For example, wireless service providers can identify the family members that are not under the same family plan as potential customers, or identify subscribers within a network of non-subscriber friends to predict and prevent churn. The impact of social ties on costumer attraction and retention is already known to telecom providers. For example, previous studies have shown that the number of customers who churn out of a service provider s network depends on the number of their friends that have already churned [1]. The strength of a social tie is also typically proportional to degree of trust [26], and trust plays a crucial role in security. Consequently, this line of research may also have applications to security. Despite the importance of such information about social relations and call behavior, there is a major gap in previous studies when it comes to the analysis of social network based on phone calls. Previous work has mainly focused on the structure of the obtained social graphs and the communities inside them, such as topological properties [1] [12][8][13]. Some previous studies have also tried to predict the existence of links and social connections [8][3][14]. Such analyses, while essential to our understanding of social networks and call graphs in general, do not provide any detailed insights on how to characterize, categorizes, or classify the type of a social tie from calling patterns. In this paper, we analyze a set of features that abstract calling patterns between subscribers, and investigate their ability to discriminate between different affinity networks. We collect and process Call Detail Records (CDRs) of a major wireless service provider from two different cities in California: a rural city (Modesto) with a relatively small population situated in an agricultural region and an urban city (San Francisco) with a larger and more diverse population, located close to Silicon Valley and two other cities (Oakland and San Jose). We assign each subscriber into various types of social ties (see also [21]
2 [23]) such as family, toll free, utility services or other based on our access to information sources about their registration accounts. We define each social group as an affinity network. In other words, according to our terminology an affinity network is a group of subscribers in which all the subscribers have a common social tie. We then identify and calculate various features that model call behavior and patterns such as the frequency, length, timing, and symmetry of calls. The results show that the affinity networks show meaningful and statistically significant differences in terms of their identified features. For example, family members call each other more often, their calls are shorter, and they have a more mutual (outgoing versus incoming) tie compared to other affinity networks. We will see that these distinctive patterns are consistent across the two geographical areas of Modesto and San Francisco despite the differences in population characteristics between the two cities. For example, in Modesto 39.3% of households contain children under the age of 18, while this is the case for 18.4% of households in San Francisco. We highlight some of the underlying correlations between these features. For example, when subscribers call each other after a long time, they tend to make longer calls. All these finding have implications on predictability and classifiability of social ties, as well as which machine learning techniques to be fruitfully used, as we also discuss in this paper. After a brief review of previous work in Section 2, we will describe the dataset that was used in this study in Section 3. Section 4 explains our methodology, and Section 5 summarizes the empirical results. Highlights of the results and their implications are discussed in Section SOCIAL NETWORK AND CALL DATA ANALYSIS Social networks have been analyzed from many perspectives. A number of recent studies have specifically used mobile call graph data to investigate and characterize the social interactions of subscribers [1][12], the evolution of social groups and the adoption of new products and services [6]. The main focus of previous work on social networks (whether from phone calls or other sources of data) has been on the structure of the obtained social graphs, including topological properties, degree distributions, core clusters, strongly connected components, extraction of communities, and community structure identification [1][12][8][13]. In particular, graph partitioning, such as spectral clustering is a popular approach for studying community structure in graphs [16][18][19][21]. Several previous studies have tried to predict the existence of social connections in different contexts, for example, by considering mutual connections on Facebook and social networking sites, or by considering proximity patterns on university campuses and other specific environments [8][3][14]. This issue has also been explored in the form of predicting missing and future links in co-authorship networks [5], phone call graphs [5][4], and simulated social graphs. The issues of node (or vertex) partitioning [2] and, more recently, link (or edge) partitioning [21][23] have also been studied in network science. When node partitioning is performed, one assigns each node to a partition or class. For edge partitioning, each edge is assigned to a partition or class. The benefit of edge partitioning compared to node partitioning is that it can model situations in which nodes do not neatly separate into disjoint, nonoverlapping classes. As we mentioned, such studies have been essential to our understanding of social networks. However they do not give us information about call patterns with respect to the type of social relations, and do not enable us to infer the affinity networks. This is the focus of this paper. Specifically, we focus on the characteristics of the edges of a social network induced by CDRs instead of focusing on node characteristics as done in most previous work. 3. DATA SET The data set we have used in our analysis contains CDRs of mobile subscribers in the cities of Modesto and San Francisco, in the state of California, as shown in Figure 1. The geographic locations of the base stations in downtown Modesto and San Francisco were taken as reference points. From each reference point, all the regions within a radius of 20 miles were covered and the data from their base stations were collected. The CDRs were collected by the telecommunication service provider Sprint. Our data collection methodology resulted in millions of phone calls between subscribers in these two cities from 30 consecutive days in the month of October, Call Detail Records CDRs are collected from various base stations and stored in a distributed file system warehouse. Each cell phone call of a subscriber is saved as a record that contains the following information: Unique subscriber IDs and phone number of the caller; Unique subscriber IDs and phone number of the call receiver; Date and time of the call s initiation; Date and time of the call s end; Direction of the call (outgoing versus incoming); Switch ID; Cell tower ID; Sector ID. From the CDRs, we constructed, as further discussed in Section 4, a social network that represents each mobile user as a vertex and the calls between them as an edge. 3.2 Population Characteristics Modesto and San Francisco are rural and urban cities respectively. They have different population size and demographics. Below, we summarize some of their characteristics Modesto Modesto has a population of about 200,000 and the population density is 5,423.4 people per square mile (2,094.0/km²). Ninety eight percent of the population lives in a household. There are about 69,000 households, out of which 39.3% (~27,000) have children under the age of 18 living in the house; 48.1% (~33,000) are married couples living together; the rest is a householder (male or female) living alone. The average family size is The population is spread out with 26.8% under the age of 18 (~54,000), 10.4% of the population aged 18 to 24 (~20,000); 26.4% of the population aged 25 to 44 (~53,000); 24.7% aged 45 to 64 (~49,000) and 11.7% of the population is 65 years of age or older (~23,000). The median age is 34.2 years. For every Our source is the United States Census for the year 2010.
3 females there are 95.0 males. For every 100 females age 18 and over, there are 91.5 males San Francisco San Francisco has a population around 805,000 and the population density is 17,160 per square mile (6,632/km²). Ninty seven percent of the population lives in a household. There are about 345,000 households, out of which 18.4% (around 63,000) have children under the age of 18 living in them; 31.6% (around 109,000) are married couples living together; the rest is a householder (male or female) without another present. The average family size is The population is spread out with 13.4% under the age of 18 (~107,000), 9.6% of the population aged 18 to 24 (~77,000), 37.5% of the population aged 25 to 44 (~301,000), 25.9% of the population aged 45 to 64 (~301,000), and 13.6% are 65 years of age or older (~109,000). The median age is 38.5 years. For every 100 females there are males. For every 100 females age 18 and over, there are males. Figure 1. Areas of interest: San Francisco and Modesto 4. METHODOLOGY In this section, we explain our methodology of processing and analyzing the CDRs as well as identifying features. 4.1 Anonymization Aside from the security measures and control of access to subscriber information, the CDRs were anonymized as the originating and destination phone numbers and identification numbers were encrypted using hashing. Hence, no personal identification was accessible. The subscriber numbers tied to a family plan account were also anonymized using hashes. Also, all our results are presented as aggregates and calculated for the overall demographic population, and no individual subscriber is pinpointed for the study. 4.2 Identification of Affinity Networks As we mentioned, a social affinity network may be a network of friends, family members, coworkers, etc. Obviously, the information about the relation types between subscribers is not provided in the CDRs. Therefore, we had to use outside information to identify the social ties without compromising the privacy of individual and personal accounts. We therefore grouped the social ties into the following affinity types: Family: for every primary subscriber, other subscribers belonging to the primary subscriber s family plan were identified. If any two subscribers were part of a common family accounts in a family plan, they were identified as family members. Toll: the numbers starting with were categorized as Toll free numbers. Utility: 2 we considered a pair of subscribers to be part of a utility network if one of the subscribers is a business establishment in Modesto or San Francisco. We created a limited but representative list of business establishments by scraping the Web. Others: all the other numbers which may include friends, professional colleagues, or any other personal relationship that a subscriber could hold. Above, we discuss toll free numbers (i.e. nodes), but in Table 1 we present toll free edges. This reflects the fact that the node partitioning into {Family, Toll, Utility, Others}, as introduced above, induces a corresponding edge partitioning. In other words, for each of the affinity edge types, we have a rule that says how it is defined in terms of the affinity node types. For example, if one node is a toll node, then all adjacent (both incoming and outgoing edges) are toll edges. By considering all the edges of one affinity type, in our case {Family, Toll, Utility, Other}, we obtain an affinity network. We could potentially identify additional affinity networks (e.g. grouping coworkers by accessing domain and corporate accounts information). However, the use of more information from Sprint or other external sources could compromise the privacy and confidentiality of the subscribers. Therefore, we limited the identified networks to the above ones. Nevertheless, we will see below that the above grouping is enough for pinpointing and analyzing the call patterns and identifying meaningful features, which is the purpose of this study. 4.3 Sampling We extracted CDRs for a set of subscribers from San Francisco and Modesto and ranked subscribers by their total number of calls. We then removed the outliers (top 10% and bottom 10%) and uniformly sampled 10,000 subscribers and their one-hop connections from this set. We then filtered all the CDRs from the two locations for this set of subscribers. The number of subscriber pairs in each edge partition (affinity network type) from each city is shown in Table 1. The table excludes self-edges, which may be used to reflect calls to voic . Table 1. Statistics for call data record (CDR) data sets Modesto San Francisco Number of call data records in the sample 1,966,022 (~1.97 million calls) 2,333,826 (~2.33 million calls) Number of subscribers 203, ,591 including primary and onehop nodes Number of social ties (edges) 304, ,236 Number of family member 534 4,220 edges Number of utility edges 25,232 4,111 Number of toll free edges 19,528 27,618 Number of other edges 258, , Visualization and Feature Identification We used a visualization tool for getting an intuitive understanding of the characteristics of different affinity networks. The visualization software NetEx [24] shows the vertices and the edges of the social graph using multiple GUI elements and 2
4 configurations. For example, as seen in Figure 2, the thickness of the edges can be varied based on the average number of calls. Such visualizations enabled us to identify and consider various features that may distinguish between affinity networks. In particular, we observed two categories of features: 1. Graph topological features: For example, in Figure 2(a), nodes 23, 24, 28, and 32 belong to the same family plan and have several mutual contacts. Figure 2(a) also shows a random set of three utility numbers and their one-hop neighbors. The topology resembles an ego centric network where the calls occur between the utility service subscriber and all the subscriber s one-hop neighbors, but no calls occur between their one-hop neighbors. In addition, the utility numbers picked do not have any mutual contacts. 2. Call pattern features: These features relate to the number, frequency, length, and timing of the calls. For example, when the thickness of the line is selected to represent the number of calls, we observe thicker edges between many family members in Figure 2. Here is how Figure 2(a) was created. All of the nodes belonging to a family plan were selected, and the number of mutual contacts between them was calculated by looking at their individual onehop neighbors. For this visualization, a set of family nodes with high numbers of mutual contacts and total calls between them were chosen. However, we observed similar graph structures and features across other family sub-networks as well. The above discussion illustrates the identification of potential candidates for the features that model the differences between call graphs of different affinity networks. We now turn to the discussion of the definition and discussion of the features that we picked for the purpose of this paper. 4.5 Definition of the Features Before introducing the features that were used, we define how call records and graphs were defined and constructed, respectively. Definition. A call record represents a call and can be defined as a tuple, where u and v are subscribers, t is the call start time, and d is the call duration. Note that this is a subset of a CDR as introduced in Section 3.1. The order of u and v in is important: u is the subscriber initiating the call while v receives the call. A set of call records is denoted. Among calls between u and v, the outgoing calls from u are defined as Θ u (u, = { { }}, while the outgoing calls from v are defined as: Θ v (u, ={ { }}. A call record relation with all calls between u and v can now be defined as Θ(u, = Θ u (u, Θ v (u,. Below, in our features, we are generally using Θ(u, since call direction generally does not matter except in one case (the engagement feature) where we use both Θ v (u, and Θ u (u,. Let G = (V, E) be an undirected graph where V represents the set of all subscribers in our sample, and E represents the set of all undirected edges {u, v}, with Θ(u, > 1. We define a social network N that consists of G along with a set of features defined for each edge; N = (G,). A key point in this paper is that we are, similar to previous work on link partitions [21] and link communities [23], focusing on features defined on edges E rather than features defined on nodes N. Figure 2. Average total call feature: (a) Family and (b) Utility affinity sub-networks plus one-hop neighbors Features Our features are defined for a given pair of subscribers, {u, v} E, and our goal is to characterize the nature of the social tie between u and v through the distribution of these features. The total number of calls defined as Θ(u,. between subscribers u and v is The average call duration between u and v is defined as: (u,= (u,v,t,d)ө(u, d/ Θ(u,. Given a pair of subscribers u and v, we consider the calls between them, arranged in ascending order by their start times. Now, consider for any two consecutive call records (u i,v i,t i,d i ) and (u i+1,v i+1,t i+1,d i+1 ) from, and define Θ(u,. The average inter-call interval is then defined as: ( ) The following feature is based on partitioning Θ(u, into weekend calls and weekday calls such that (u, U (u, and (u, (u,
5 For the purpose of this paper, we treat Saturday and Sunday as weekend days, and other days as weekdays. The asymmetry between weekend and weekday calls, or weekday to weekend asymmetry, is defined with respect to the total number of calls: ww ( u, wd ( u, we( u,. ( u, If there is maximal asymmetry, where calls take place on weekends or weekdays only. If, this means minimal asymmetry, with calls evenly distributed. The engagement ratio feature gives a measure of the social interaction between u and v. eg u( u, v( u, ( u, 1 ( u, The purpose of this feature is to characterize the interaction patterns between subscribers u and v. If the ratio of engagement is, there is a symmetric interaction between the pair of subscribers. If, there are either only outgoing calls to v (from u) or incoming calls from v (to u ). After we calculated features for all edges, we compared the features between different types of affinity networks and for different geographic regions (Modesto and San Francisco). We also explored the underlying correlations between the features. Statistically significant results are presented in Section Software Tools and Techniques As the CDRs for subscribers typically amount to massive data sets, it can be computationally intensive to construct features from them. We used Hadoop, see which implements the MapReduce parallel computing model [25] in order to process CDRs. Specifically, CDRs were stored in a Hadoop distributed file system and processed by feature construction algorithms written using the MapReduce framework. Figure 3 illustrates how we used the MapReduce mappers and reducers for our purpose. For every CDR, a mapper function emits a (key,value) pair, in our case a pair for each pair of subscribers, and the reducer function performs the aggregation operation needed for the desired features. In other words, the key emitted by the mapper application is the subscriber pair (i.e., the source subscriber phone number and the destination subscriber phone number), the direction of the call (incoming or outgoing), and so forth. For every key (subscriber pair), the corresponding values from the CDR are emitted as a value. Once the information had been calculated using Hadoop, SQL scripts were written to mine the information accordingly in the database. The Java-based NetEx software [24] was interfaced with the database to present data for visualizations. nc Figure 3. Call data record (CDR) processing using MapReduce and Hadoop 5. RESULTS We analyzed the call behavior with respect to the above features from several perspectives: 1) the behavioral differences among different affinity networks which is reflected in meaningful variations of their associated features; 2) dependencies between various characteristics of human call behavior; and 3) differences between geographic regions. 5.1 Differences between Affinity Networks We here present the results of analyzing and comparing the above features with respect to different affinity networks in the two cities. We observed different call patterns reflected in various features as explained in the following subsections Total Number of Calls Most of the subscribers in a family network have very high number of calls between each other. The average value of total number of calls made is the highest between family members ( calls in a month) in Modesto, whereas the average value of the total calls made to utility and toll free numbers have relatively low values of 2.48 and = 2.29 respectively. Analysis of variance shows a significant difference between the family members and the others [Fisher s Score, F = 2946, p (pvalue for statistical significance) < 0.001]. The average value of total number of calls made to family members in San Francisco is high, too, ( 31.44) when compared to toll free and utility service numbers which are 2.26 and The difference of the means between family members and others is again statistically significant (F = 20931, p < 0.001). No meaningful difference was found in the number of family member calls between the two cities. Figure 4 shows the mean value of total number of calls made within different affinity networks in the Modesto and San Francisco regions.
6 other on the weekends (F = 56.7 and p < for Modesto, and F =350, p < for San Francisco). Figure 4. Average number of calls Engagement Ratio The engagement ratio defines how reciprocal the call pattern between two subscribers is. The higher the engagement ratio, the more balance we observe in the number of outgoing and incoming calls. The average engagement ratios calculated for various affinity networks are shown in Figure 5. The family network has a high engagement ratio of = 0.57 whereas the toll free and utility services have very low values of = 0.05 and = 0.14, respectively, in Modesto. The engagement ratio in San Francisco also shows a similar value for the family network, namely = 0.56, and low values of = 0.03 and = 0.07 respectively for the toll free and utility service networks. An analysis of variance confirms a statistically meaningful difference between family members and other affinity networks (F = 27.3 and p < for Modesto, and F = 480 and p < for San Francisco). The above results suggest that family subscribers have a reciprocal social interaction with each other. In contrast, mostly outgoing calls are made to utility and toll free numbers; utility and toll free numbers do not reciprocate equally. Figure 5. Average engagement ratio Weekend versus Weekday Figure 6 shows the percentage of the calls that were made on the weekends and Figure 7 shows the asymmetry between weekday and weekend calls for each affinity network. The main difference, perhaps, is observed in the asymmetry of weekday to weekend calls between family members compared to the rest (others, toll, and utility). Toll free and utility numbers are called much more during the weekdays. Family members are more likely to call each Figure 6. Average weekend percentage of the calls Figure 7. Average weekday to weekend asymmetry Call Duration Figure 8 reports empirical results for average call duration. The calls occurring between pairs of subscribers in a family network have small average call duration and seconds, respectively, both in Modesto and San Francisco. The utility services and toll free networks have higher average call durations = and seconds in Modesto, and = and = seconds in San Francisco, respectively. Running an Anova analysis, we also observed a statistically meaningful difference between the call duration of the three groups of 1) family members, 2) toll-free/utility network, and 3) others (F = 8.2 and p = for Modesto, and F = 1302 and p < for San Francisco). This suggests that the calls between family members are quicker and contain short conversations, whereas calls to business establishments and tollfree numbers last for a substantially longer time. The mean values are depicted in Figure 8.
7 Figure 8. Average call duration Inter-call Interval The calls occurring between subscribers in a family network have a smaller average inter-call interval of 3272 minutes (~2 days) in Modesto and 3586 minutes (~ 2.5 days) in San Francisco when compared to other affinity networks. The utility service networks have the highest inter-call interval of 8361 minutes (~6 days) and 8589 (~7 days) in Modesto and San Francisco respectively, i.e. they occur very sporadically. This suggests that the subscribers belonging to a family network call each other more frequently compared to how often they call their utility or toll free numbers. The significance of the results were confirmed by an Anova analysis (F = 25.5, p < for Modesto, and F = 458 and p < for San Francisco). The mean values for the inter-call duration for each affinity network are plotted in Figure 9. interval, there needs to be more frequent calls (i.e., shorter time intervals). The total number of calls and the weekday to weekend asymmetry (ρ=-0.31): The subscriber pairs that make more calls to each other also make more calls on the weekends (meaning a smaller asymmetry ). This could be because family members make more calls on the weekends. We talk more about the cause versus effect issue in Section 6, as this discussion applies to most correlations found. The average call duration and average inter-call arrival time (ρ=0.21): people who call each other less frequently make longer calls. This effect is illustrated in Figure 10 and Figure 11. The total number of calls and the engagement ratio (ρ=0.34): The subscriber pairs that make more calls have a more reciprocal call pattern. The weekday to weekend asymmetry and the engagement ratio (ρ=-0.36): The subscriber pairs with more calls on the weekends have more reciprocal call patterns. The significance of the above correlations also holds for each city separately and is consistent across the two geographical regions. For example, Figures 10 and 11 show the average inter-call interval versus the average call duration in Modesto and San Francisco respectively. We see that in both cities, the calls take longer as the time interval between them increases. Figure 9. Average inter-call interval In the next subsection, we will analyze the dependencies between the above features and in Section 6 we will talk about the meanings and implications of the results. 5.2 Correlations between Features When it comes to feature selection, feature extraction, and the selection of the appropriate machine learning and data mining techniques for classification, it is important to be aware of the correlations among the features. The features that we analyzed in the previous subsection are obviously not independent. We looked at the correlation matrix and investigated the statistically meaningful correlations. The following pairs of features were found to have significant correlations: The total number of calls and the average inter-call interval (ρ=-0.48): This correlation is relatively obvious and is due to the definition of these features. For the pairs with more calls during a given time Figure 10. Average call duration (y-axis) as a function of intercall interval (x-axis) in Modesto Figure 11. Average call duration (y-axis) as a function of intercall interval (x-axis) in San Francisco
8 6. DISCUSSION AND CONCLUSION In this study, we aimed to address the current gap in our knowledge of human phone call behavior, one manifestation of ties in social networks. In particular, we have investigated how behaviors vary across different social affinity networks, which could include family members, coworkers, friends, service providers, customers, etc. We collected and processed Call Detail Records (CDRs) provided by Sprint from the two California cities of Modesto and San Francisco. We built a social graph with the edges (social ties) between two subscribers specified by phone calls between them. We then assigned each subscriber into four groups of social ties: family, toll free, utility services or other based on our access to information sources about their registration accounts. We defined each social group as an affinity network, in which all the subscribers grouped in an affinity networks have a certain common social tie. We then identified, calculated, analyzed, and compared various features that model call behavior and patterns such as the frequency, length, timing, and symmetry of the calls. Some key results and their implications can be summarized as follows: Call patterns of family members in both cities indicate strong social ties between them, which is reflected in the total number of calls and the frequency of the calls. These two features show statistically significant changes between family members and all other affinity networks. Furthermore, subscribers belonging to a family network in Modesto and San Francisco make on average 77.31% and 75.4%, respectively, of their total calls to each other and not outside the family network; on average only 9.46% and 10.65%, respectively, of their total calls are made to the utility and toll free numbers affinity networks when combined. Family members also call each other very frequently with a small time interval between successive calls, but calls to utility numbers are sporadic as the time taken between successive calls is long. Family members show a more reciprocal and mutual call behavior. This is reflected in the strong engagement ratios of 57% and 56% in Modesto and San Francisco respectively. In contrast, most of the calls made to utility numbers are not reciprocated equally (this is measured by weak engagement ratios of 13% and 10% respectively). While previous research [16][19] identified differences between communities in terms of personal network topologies and some behavioral characteristics, we found that the nature of social ties and their associated call patterns in Modesto and San Francisco have strong similarities. They also show very similar variations in terms of their related features across different affinity networks. We found strong correlations between the features that model call patterns. Some of these correlations are rather intuitive. For example, the number of calls is correctly expected to have a negative correlation with the inter-call arrival time. However, some show interesting call behavior. For example, the subscribers who call each other more frequently make shorter calls. Currently, it is not clear whether such correlations are the cause or the effect of some of the differences across affinity networks. For example, the above correlation could be the cause or the effect of shorter calls among family members. This is a topic for future research. Statistically significant features and their correlations have implications for research on prediction and classification of social ties. They mean that given these features and the social graph, we should be able to classify each edge of the graph into a specific social affinity network and infer the type of relationship. However, the dependencies among the features indicate that the classification approach would benefit from being able to capture such dependencies. This appears to benefit, for example, machine learning using random forests and Bayesian networks over simpler approaches such as single decision trees and naïve Bayesian classifiers. As a limitation of this study, we should note that the affinity network types that we identified and used as our ground truth were inferred and not directly verified. For example, we assumed that family plans must be shared among family members. While this is common sense, we did not verify this assumption. This means that non-family members who share a family plan introduce some noise into the dataset. There are similar limitations associated with the Utility, Toll, and Others edge types. From an application point of view, there are several opportunities to build on and expand this work. Knowledge about social tie strength and type is invaluable for different research and industrial purposes, especially for telecom providers. For example, customer acquisition can be enhanced by focusing on family members of existing customers that are not under their network; Churn prevention can be enhanced by focusing on friends of churned customers; Search engines and mediating businesses that recommend services to subscribers or customers to services can refine their recommendations and categorize them based on relationship type information, etc. Furthermore, knowledge and modeling of distinct call behavior of different social affinities, even without the affinity prediction capabilities, can be useful in predicting how call patterns and consequently, the load on the network may change in future as a result of predicted acquisitions and churns. There are also applications of tie strength and type in the area of security. For example, strength of social ties is a useful indicator of trust in many real-world relationships [26], and trust plays a crucial role in security applications. As future work, we intend to a) expand the type of social affinities that we considered and add coworkers, friends, etc. by using more information sources without invading subscriber privacy; b) investigate the impact of the features that are associated with the social graph topology and also location and proximity patterns of the subscribers [3]; c) include additional cities and geographical areas from different parts of the country and the world to have more diverse datasets; and d) apply the proper classification techniques to classify the edges into affinity networks. 7. ACKNOWLEDGMENT The authors would like to thank Kevin Leduc and Jason Powers, from Sprint Applied Research & Advanced Technology Labs, for providing technical and infrastructure support. We would also like to thank Sprint-Nextel and the Carnegie Mellon University - Silicon Valley community for their support of this work. This work is supported, in part, by NSF award CCF to Ole J. Mengshoel.
9 REFERENCES [1] A. Nanavati, S. Gurumurthy, G. Das, D. Chakraborty, K. Dasgupta, S. Mukherjea, and A. Joshi. On the structural properties of massive telecom call graphs: findings and implications. In Proc. of 15th ACM CIKM, pages , [2] A. Clauset, C. Moore, M. E. J. Newman, Hierarchical structure and the prediction of missing links in networks, Nature, 453, pages , May [3] D. J. Crandall, L. Backstrom, D. Cosley, S. Suri, D. Huttenlocher, and J. Kleinberg: Inferring social ties from geographic coincidences. PNAS, 107, pages , December [4] D. Wang, D. Pedreschi, C. Song, F. Giannotti, A. L. Barabasi. Human mobility, social ties, and link prediction. In Proc. of KDD-11, pages , [5] D. Liben-Nowell and J. Kleinberg, The link prediction problem for social networks. In Proc. of the 12 th International Conference on Information and Knowledge management (CIKM-03), [6] E. Cho, S. A. Myers, J. Leskovec. Friendship and mobility: user movement in location-based social networks. In Proc. of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, California, USA, August 21-24, [7] G. Palla, A.-L. Barabasi, and T. Vicsek. Quantifying social group evolution. Nature, 446, pages , April [8] H. Zhang and R. Dantu, Predicting social ties in mobile phone networks. In Proc. of 2010 IEEE International Conference on Intelligence and Security Informatics, pages 25-30, [9] J. Resig, S. Dawara, C. Homan, and A. Teredesai. Extracting social networks from instant messaging populations. In Proc. of ACM SIGKDD, pages 22-25, [10] J.-P. Onnela, J. Saramäki, J. Hyvönen, G. Szabó, D. Lazer, K. Kaski, J. Kertész, and A.-L. Barabási. Structure and tie strengths in mobile communication networks. PNAS, 104, pages , May [11] K. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea, A. A. Nanavati, and A. Joshi. Social ties and their relevance to churn in mobile telecom networks. In Proc. of EDBT-08, pages , [12] M. Seshadri, S. Machiraju, A. Sridharan, J. Bolot, C. Faloutsos, and J. Leskovec. Mobile call graphs: Beyond power-law and lognormal distributions. In Proc. of KDD-08, pages , [13] M. E. J. Newman. Detecting community structure in networks. The European Physical Journal B - Condensed Matter and Complex Systems, 38, pages , March 2004), [14] M. C. Gonzalez, C. A. Hidalgo, and A.-L. Barabasi. Understanding individual human mobility patterns. Nature: 453(7196), pages , [15] N. Eagle, A. Pentland, and D. Lazer. Inferring friendship network structure by using mobile phone data. Proceeding of National Academic Science, 106(36), pages , [16] N. Eagle, Y.-A. de Montjoye, and L. M. A. Bettencourt. Community Computing: Comparisons between Rural and Urban Societies using Mobile Phone Data, In Proc. of CSE- 09, pages , [17] P. Perona and W. T. Freeman. A factorization approach to grouping. In Proc. of European Conference on Computer Vision, pages , [18] S. Sarkar and K. L. Boyer. Quantitative measures of changed based on feature organization: Eigenvalues and eigenvectors. Computer Vision and Image Understanding, 71, pages , July [19] S. Isaacman, R. Becker, R. Cáceres, S. Kobourov, J. Rowland, and A. Varshavsky. A tale of two cities. In Proc. of HotMobile-10, pages [20] T. Shi, M. Belkin, and B. Yu. Data Spectroscopy: Learning Mixture Models using Eigenspaces of Convolution Operators. In Proc. of ICML, [21] T. S. Evans and R. Lambiotte. Line Graphs, Link Partitions, and Overlapping Communities. Phys. Rev. E, , [22] U. V. Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17, pages , [23] Y.-Y. Ahn, J. P. Bagrow, and S. Lehman. Link communities reveal multiscale complexity in networks. Nature, 466. pages , June [24] M. Cossalter, O. J. Mengshoel, and T. Selker. Visualizing and understanding large-scale Bayesian networks. In Proc. of the AAAI-11 Workshop on Scalable Integration of Analytics and Visualization, pages 12-21, [25] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. In Proc. of the 6th conference on Symposium on Operating Systems Design & Implementation - Volume 6 (OSDI-04), San Francisco, CA, [26] T. Hyun-Jin Kim, A. Yamada, V. Gligor, J. I. Hong, and A. Perrig. User-Controlled Trust Establishment through Visualization of Tie Strength. Technical report CMU-CyLab , Carnegie Mellon University, February 2011.
User Modeling for Telecommunication Applications: Experiences and Practical Implications
User Modeling for Telecommunication Applications: Experiences and Practical Implications Heath Hohwald, Enrique Frias-Martinez, and Nuria Oliver Data Mining and User Modeling Group Telefonica Research,
More informationSOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS
SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland
More informationSTATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and
Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table
More informationBig Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationDetecting Anomalies in Mobile Telecommunication Networks Using a Graph Based Approach
Proceedings of the Twenty-Eighth International Florida Artificial Intelligence Research Society Conference Detecting Anomalies in Mobile Telecommunication Networks Using a Graph Based Approach Cameron
More informationMapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
More informationGraph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
More informationGroup CRM: a New Telecom CRM Framework from Social Network Perspective
Group CRM: a New Telecom CRM Framework from Social Network Perspective Bin Wu Beijing University of Posts and Telecommunications Beijing, China wubin@bupt.edu.cn Qi Ye Beijing University of Posts and Telecommunications
More informationEstimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR:
Estimation of Human Mobility Patterns and Attributes Analyzing Anonymized Mobile Phone CDR: Developing Real-time Census from Crowds of Greater Dhaka Ayumi Arai 1 and Ryosuke Shibasaki 1,2 1 Department
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationHow To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationDiscovery of User Groups within Mobile Data
Discovery of User Groups within Mobile Data Syed Agha Muhammad Technical University Darmstadt muhammad@ess.tu-darmstadt.de Kristof Van Laerhoven Darmstadt, Germany kristof@ess.tu-darmstadt.de ABSTRACT
More informationSocial Network Discovery based on Sensitivity Analysis
Social Network Discovery based on Sensitivity Analysis Tarik Crnovrsanin, Carlos D. Correa and Kwan-Liu Ma Department of Computer Science University of California, Davis tecrnovrsanin@ucdavis.edu, {correac,ma}@cs.ucdavis.edu
More informationarxiv:1408.1519v2 [cs.si] 8 Aug 2014
Group colocation behavior in technological social networks arxiv:48.59v2 [cs.si] 8 Aug 24 Chloë Brown, Neal Lathia, Anastasios Noulas, Cecilia Mascolo, and Vincent Blondel 2 Computer Laboratory, University
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationAn Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis]
An Order-Invariant Time Series Distance Measure [Position on Recent Developments in Time Series Analysis] Stephan Spiegel and Sahin Albayrak DAI-Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7,
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationApplied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets
Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets http://info.salford-systems.com/jsm-2015-ctw August 2015 Salford Systems Course Outline Demonstration of two classification
More informationDIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE
DIGITS CENTER FOR DIGITAL INNOVATION, TECHNOLOGY, AND STRATEGY THOUGHT LEADERSHIP FOR THE DIGITAL AGE INTRODUCTION RESEARCH IN PRACTICE PAPER SERIES, FALL 2011. BUSINESS INTELLIGENCE AND PREDICTIVE ANALYTICS
More informationInternational Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
More informationAPPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk
Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationExploring spatial decay effect in mass media and social media: a case study of China
Exploring spatial decay effect in mass media and social media: a case study of China Yihong Yuan Department of Geography, Texas State University, San Marcos, TX, USA, 78666. Tel: +1(512)-245-3208 Email:yuan@txtate.edu
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationIJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS
IJCSES Vol.7 No.4 October 2013 pp.165-168 Serials Publications BEHAVIOR PERDITION VIA MINING SOCIAL DIMENSIONS V.Sudhakar 1 and G. Draksha 2 Abstract:- Collective behavior refers to the behaviors of individuals
More informationClustering Anonymized Mobile Call Detail Records to Find Usage Groups
Clustering Anonymized Mobile Call Detail Records to Find Usage Groups Richard A. Becker, Ramón Cáceres, Karrie Hanson, Ji Meng Loh, Simon Urbanek, Alexander Varshavsky, and Chris Volinsky AT&T Labs - Research
More informationLink Prediction in Social Networks
CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily
More informationNetwork Performance Monitoring at Small Time Scales
Network Performance Monitoring at Small Time Scales Konstantina Papagiannaki, Rene Cruz, Christophe Diot Sprint ATL Burlingame, CA dina@sprintlabs.com Electrical and Computer Engineering Department University
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationPerformance Metrics for Graph Mining Tasks
Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics Supervised Learning Performance Metrics Unsupervised Learning Performance Metrics Optimizing Metrics Statistical
More informationSocial Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationSTATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
More information20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns
20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns John Aogon and Patrick J. Ogao Telecommunications operators in developing countries are faced with a problem of knowing
More informationUse of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach
Use of mobile phone data to estimate mobility flows. Measuring urban population and inter-city mobility using big data in an integrated approach Barbara Furletti, Lorenzo Gabrielli, Giuseppe Garofalo,
More informationPolicy-based Pre-Processing in Hadoop
Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden yi.cheng@ericsson.com, christian.schaefer@ericsson.com Abstract While big data analytics provides
More informationRandom forest algorithm in big data environment
Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationOn the Analysis and Visualisation of Anonymised Call Detail Records
On the Analysis and Visualisation of Anonymised Call Detail Records Varun J, Sunil H.S, Vasisht R SJB Research Foundation Uttarahalli Road, Kengeri, Bangalore, India Email: (varun.iyengari,hs.sunilrao,vasisht.raghavendra)@gmail.com
More informationGraph Mining Techniques for Social Media Analysis
Graph Mining Techniques for Social Media Analysis Mary McGlohon Christos Faloutsos 1 1-1 What is graph mining? Extracting useful knowledge (patterns, outliers, etc.) from structured data that can be represented
More informationEducational Social Network Group Profiling: An Analysis of Differentiation-Based Methods
Educational Social Network Group Profiling: An Analysis of Differentiation-Based Methods João Emanoel Ambrósio Gomes 1, Ricardo Bastos Cavalcante Prudêncio 1 1 Centro de Informática Universidade Federal
More informationThe Impact of YouTube Recommendation System on Video Views
The Impact of YouTube Recommendation System on Video Views Renjie Zhou, Samamon Khemmarat, Lixin Gao College of Computer Science and Technology Department of Electrical and Computer Engineering Harbin
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationComparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
More informationCustomer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
More informationLarge-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
More informationPractical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
More informationBig Data para comprender la Dinámica Humana
Big Data para comprender la Dinámica Humana Carlos Sarraute Grandata Labs Hablemos de Big Data 26 noviembre 2014 @ 2014 All rights reserved. Brief presentation Grandata Founded in 2012. Leverages advanced
More informationNETZCOPE - a tool to analyze and display complex R&D collaboration networks
The Task Concepts from Spectral Graph Theory EU R&D Network Analysis Netzcope Screenshots NETZCOPE - a tool to analyze and display complex R&D collaboration networks L. Streit & O. Strogan BiBoS, Univ.
More informationUSING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu
More information2015 Workshops for Professors
SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market
More informationIBM SPSS Modeler Social Network Analysis 15 User Guide
IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM
More informationCHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES
International Journal of Scientific and Research Publications, Volume 4, Issue 4, April 2014 1 CHURN PREDICTION IN MOBILE TELECOM SYSTEM USING DATA MINING TECHNIQUES DR. M.BALASUBRAMANIAN *, M.SELVARANI
More informationScaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce
Scaling Bayesian Network Parameter Learning with Expectation Maximization using MapReduce Erik B. Reed Carnegie Mellon University Silicon Valley Campus NASA Research Park Moffett Field, CA 94035 erikreed@cmu.edu
More informationSingle Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationCustomer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
More informationCommunity Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer
Community Detection Proseminar - Elementary Data Mining Techniques by Simon Grätzer 1 Content What is Community Detection? Motivation Defining a community Methods to find communities Overlapping communities
More informationBNG 202 Biomechanics Lab. Descriptive statistics and probability distributions I
BNG 202 Biomechanics Lab Descriptive statistics and probability distributions I Overview The overall goal of this short course in statistics is to provide an introduction to descriptive and inferential
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationViral Marketing in Social Network Using Data Mining
Viral Marketing in Social Network Using Data Mining Shalini Sharma*,Vishal Shrivastava** *M.Tech. Scholar, Arya College of Engg. & I.T, Jaipur (Raj.) **Associate Proffessor(Dept. of CSE), Arya College
More informationHow To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
More informationFault Tolerance in Hadoop for Work Migration
1 Fault Tolerance in Hadoop for Work Migration Shivaraman Janakiraman Indiana University Bloomington ABSTRACT Hadoop is a framework that runs applications on large clusters which are built on numerous
More informationExpansion Properties of Large Social Graphs
Expansion Properties of Large Social Graphs Fragkiskos D. Malliaros 1 and Vasileios Megalooikonomou 1,2 1 Computer Engineering and Informatics Department University of Patras, 26500 Rio, Greece 2 Data
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationAnalyzing Customer Churn in the Software as a Service (SaaS) Industry
Analyzing Customer Churn in the Software as a Service (SaaS) Industry Ben Frank, Radford University Jeff Pittges, Radford University Abstract Predicting customer churn is a classic data mining problem.
More informationDivide-n-Discover Discretization based Data Exploration Framework for Healthcare Analytics
for Healthcare Analytics Si-Chi Chin,KiyanaZolfaghar,SenjutiBasuRoy,AnkurTeredesai,andPaulAmoroso Institute of Technology, The University of Washington -Tacoma,900CommerceStreet,Tacoma,WA980-00,U.S.A.
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationLocation Analytics for Financial Services. An Esri White Paper October 2013
Location Analytics for Financial Services An Esri White Paper October 2013 Copyright 2013 Esri All rights reserved. Printed in the United States of America. The information contained in this document is
More informationData Mining for Fun and Profit
Data Mining for Fun and Profit Data mining is the extraction of implicit, previously unknown, and potentially useful information from data. - Ian H. Witten, Data Mining: Practical Machine Learning Tools
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationClustering. 15-381 Artificial Intelligence Henry Lin. Organizing data into clusters such that there is
Clustering 15-381 Artificial Intelligence Henry Lin Modified from excellent slides of Eamonn Keogh, Ziv Bar-Joseph, and Andrew Moore What is Clustering? Organizing data into clusters such that there is
More informationAn Empirical Study of Application of Data Mining Techniques in Library System
An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationMobile Call Graphs: Beyond Power-Law and Lognormal Distributions
Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions Mukund Seshadri Sprint Burlingame, California, USA mukund.seshadri@sprint.com Jean Bolot Sprint Burlingame, California, USA bolot@sprint.com
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationSpecific Usage of Visual Data Analysis Techniques
Specific Usage of Visual Data Analysis Techniques Snezana Savoska 1 and Suzana Loskovska 2 1 Faculty of Administration and Management of Information systems, Partizanska bb, 7000, Bitola, Republic of Macedonia
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationA SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University park.764@osu.edu yilmaz.15@osu.edu ABSTRACT Depending
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
More informationThree Effective Top-Down Clustering Algorithms for Location Database Systems
Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationFault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
More informationAnalysis of Internet Topologies: A Historical View
Analysis of Internet Topologies: A Historical View Mohamadreza Najiminaini, Laxmi Subedi, and Ljiljana Trajković Communication Networks Laboratory http://www.ensc.sfu.ca/cnl Simon Fraser University Vancouver,
More informationHow To Perform An Ensemble Analysis
Charu C. Aggarwal IBM T J Watson Research Center Yorktown, NY 10598 Outlier Ensembles Keynote, Outlier Detection and Description Workshop, 2013 Based on the ACM SIGKDD Explorations Position Paper: Outlier
More informationMobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
More informationHadoop Operations Management for Big Data Clusters in Telecommunication Industry
Hadoop Operations Management for Big Data Clusters in Telecommunication Industry N. Kamalraj Asst. Prof., Department of Computer Technology Dr. SNS Rajalakshmi College of Arts and Science Coimbatore-49
More informationDemographics of Atlanta, Georgia:
Demographics of Atlanta, Georgia: A Visual Analysis of the 2000 and 2010 Census Data 36-315 Final Project Rachel Cohen, Kathryn McKeough, Minnar Xie & David Zimmerman Ethnicities of Atlanta Figure 1: From
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationIDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY
H. Efstathiades, D. Antoniades, G. Pallis, M. D. Dikaiakos IDENTIFICATION OF KEY LOCATIONS BASED ON ONLINE SOCIAL NETWORK ACTIVITY 1 Motivation Key Locations information is of high importance for various
More informationMeasurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science*
Measurement of Human Mobility Using Cell Phone Data: Developing Big Data for Demographic Science* Nathalie E. Williams 1, Timothy Thomas 2, Matt Dunbar 3, Nathan Eagle 4 and Adrian Dobra 5 Working Paper
More informationClustering Marketing Datasets with Data Mining Techniques
Clustering Marketing Datasets with Data Mining Techniques Özgür Örnek International Burch University, Sarajevo oornek@ibu.edu.ba Abdülhamit Subaşı International Burch University, Sarajevo asubasi@ibu.edu.ba
More information