How To Analyze Activity At Enron
|
|
|
- Russell Daniel
- 5 years ago
- Views:
Transcription
1 Discovery of Communication Networks from the Enron Corpus with a Genetic Algorithm using Social Network Analysis Garnett Wilson and Wolfgang Banzhaf During the legal investigation of Enron Corporation, the U.S. Federal Regulatory Commission (FERC) made public a substantial data set of the company s internal corporate s. This work presents a genetic algorithm (GA) approach to social network analysis (SNA) using the Enron corpus. Three SNA metrics, degree, density, and proximity prestige, were applied to the detection of networks of high activity and presence of important actors with respect to transactions. Quantitative analysis revealed that density and proximity prestige captured networks of high activity more so than degree. Subsequent qualitative analysis reveals that there are trade-offs in the selection of SNA metrics. Examination of the discovered social networks revealed that density and proximity prestige isolated networks involving key actors to a greater extent than degree. In particular, density picked out interesting patterns in terms of volume, while proximity prestige better isolated key actors at Enron. The roles of the particular actors picked out by the networks as reasons for their prominence are also discussed. E I. INTRODUCTION nron corporation filed for bankruptcy in December 2001 due to a mixture of corruption, fraudulent accounting, and poor regulation. Once one of the world s largest electricity and natural gas companies with over employess worldwide [1], its stock plummeted from heights of over $90 a share to $0.05 (Sept. 2003) during the ensuing scandal [2]. The Enron data set of s was made public by the U.S. Federal Regulatory Commission (FERC) during the legal investigation of Enron. This original corpus disclosed to the public included over messages belonging to 158 users that were sent or saved in folders during the collapse of Enron [3], spanning from 1998 to 2002 [4]. The corpus is now considered a valuable resource for research in link analysis, social network analysis, and natural language processing. This paper presents a genetic algorithm (GA) approach to analysis of s during the fall of Enron that uses fitness metrics from the field of social network analysis (SNA) to rank the activity of Enron employees and examine the possible key players at Enron. This work thus presents the first use (to This work was supported by a PRECARN Postdoctoral Fellowship and Memorial University of Newfoundland. G. Wilson is with the Department of Computer Science, Memorial University of Newfoundland, St. John s, NL, A1B 3X5, Canada ( [email protected]). W. Banzhaf is head of the Department of Computer Science, Memorial University of Newfoundland, St. John s, NL, A1B 3X5, Canada ( [email protected]). the authors knowledge) of the Enron corpus in the field of evolutionary computation (EC), and the first instance of social network analysis (SNA) used with an evolutionary algorithm. Through the use of a GA with a SNA-based fitness measure, all senders and receivers of the Enron data set can be examined for relationships. Previous techniques have largely restricted SNA analysis of the Enron data set to the 151 employees who had sent or stored the s. The following section examines related previous work. Section 3 explains the creation and composition of the data set used in our experiments, and Section 4 describes the SNA measures used by the GA for examination of the ing activities of the Enron employees. Section 5 provides details on the SNA-based GA and experiment parameterization. Section 6 provides quantitative results, and visualizations of the final networks corresponding to the highest value for each SNA metric are then examined in Section 7. Conclusions follow in Section 8. II. PREVIOUS WORK The Enron data set has been used extensively for research including data mining, text analysis, and natural language processing. To provide a few examples, Berry and Browne [5] detected topics and clustered messages using sparse term-by-message matrices and a low rank non-negative matrix factorization algorithm. Priebe et al. [6] used a technique called scan statistics, which slides a moving window over portions of data to find outlying points corresponding to deviations from normal communications among individuals. Keila and Skillicorn [7] examined structural features of s, such as message length and word usage and frequency, to detect patterns of unusual communication. The Enron corpus has also been subject to SNA analyses of varying rigor. Shetty and Abidi [4] produced a social network from their cleaned version of the data set involving the 151 employee accounts possessing the accounts, where a link was only established between employees if 5 or more s were exchanged and the exchange of s was reciprocal. Chapanoid et al. [8] produce both a directed and undirected social network from the data set where a link is only considered if 30 or more s have been exchanged between any of 150 employees with the accounts where the s were stored. (Cleaning or relevance-based design decisions cause the number of main employees considered to range from 147 to 151 throughout the literature.) Chapanoid et al. provide particular SNA measures on the graphs including degree, distance, and betweenness. McCallum et
2 al. [9] describe an ART (Author-Recipient-Topic) model based on a Bayesian network to simultaneously model message content and the directed social network in which the messages are sent. The authors only consider a social network among pre-selected employees in particular divisions of the Enron company, with those being a subset of a total of 147 from the main 151 employees. Duan et al. [10] use a social network only involving 150 of the employees that were the original focus of the dataset, with edges weighted according to the s sent between users. They use a link analysis algorithm to rank those 150 employees according to their implied importance based on communication. Diesener et al. [11] provided a more in-depth social network-based examination of the network across the time frame of the s included in the corpus. Expanding on previous work, they included some senders of s as graph nodes even if they were not members of the list of Enron employees whose s constituted the data set. In particular, they added 525 previously unaccounted employees of Enron and Andersen (the accountancy firm associated with the Enron scandal) and their associated addresses to raise the number of addresses considered from 151 to As in some previous work, they also weighted the edges according to the number of s sent and used a directed graph (digraph). They found that during the actual crisis period, communication across employees became more diverse in terms of contacts and corporate roles and previously disconnected employees established ongoing mutual communication. The authors also found that interpersonal communication generally intensified and spread widely throughout the network as the collapse of the company progressed, leading to increased density of the network. Frantz and Carley [12] examine actors in the Enron database using 165 weekly snapshots of their activity. The authors report on the five SNA metrics of betweenness, degree (in and out), closeness, and eigenvector. This work goes beyond previous SNA-based analyses of the Enron corpus, as the work introduces intelligent search appropriate for very large search spaces. Most researchers [4, 8, 9] only analyze a social network based on the approximately 151 primary employees of the Enron data set. Deisener et al. [11] and Frantz and Carley [12] were the only researchers to use more than the primary 151 employees in the data set. Deisener et al. added only a select number of extra individuals they felt were pertinent, whereas Frantz and Carley examined addresses, but produced manageable social networks by only examining weekly snapshots of the ing behaviour. Rather than addition of arbitrary individuals to the social network or using short windows of time, the SNA GA introduced in this work uses an evolutionary approach to intelligently add appropriate individuals selected from the entire set of sender or recipient addresses in the Enron database to a selected social subnetwork. That is, the GA search mechanism is efficient at exploring the search space of possible networks by retaining the most relevant partitions of account networks (called building blocks in evolutionary computation models) and then attempting to add associated accounts of interest in an intelligent search. The algorithm builds networks of interest by keeping interesting sub-graphs as building blocks in the evolutionary process and applying mutation to explore other connections to produce a subgraph of even greater interest. A computational intelligence approach such as this is an appropriate alternative to exhaustive search or selection of arbitrary accounts to determine SNA networks of interest given the size of the data set when all senders and receivers are considered. The search space corresponding to the Enron data set used in this work (that of Shetty and Adibi [4]) involves unique addresses and s which link them. For instance, in this work we examine networks of interest featuring 50 unique s (up to 100 nodes) in a GA individual. Given a total of unique s, this is a search space of choose 50, or!!!, which is greater than The SNA GA approach of this work provides a way of isolating areas of the entire network of the Enron database participants (senders and recipients) for further inspection, as the data set in its entirety could not be searched exhaustively, let alone comprehended or analyzed for patterns by a user. By using evolutionary computation, all participants in the Enron data set and their behaviour can be examined. While Langdon et al. [13] and Luthi et al. [14] have both used social network analysis metrics to examine the relationship among authors in the genetic programming (GP) community, to the authors knowledge evolutionary computation techniques themselves have not yet been applied to social network analysis. Graph theory and link analysis, however, are well-established tools for the detection of fraud across a number of domains. For instance, Galloway and Simoff [15] describe the use of the NetMap commercial software tool to determine an actual case of insurance fraud. Data mining techniques have been used along with link analysis search for fraud detection. Cortes et al. [16] developed a method to analyze large dynamic graphs of telecommunications transactions to identify fraud by examining small subgraphs of interest called communities of interest. Rather than using social network analysis metrics to determine subnetworks of interest, the authors examine subgraphs consisting of nodes connected by the highest-valued edges to a central node within some low radius. This work is the first instance of a GA using SNA measures as a fitness metric. III. DATA SET The Enron corpus used in this work is that provided by Shetty and Adibi [4]. The dataset was cleaned by the authors to remove duplicate s, messages determined to contain junk data, blank messages, and s generated by the mail system as transaction failures. A MySql database was formed from the cleaned corpus, and contained four tables for the entities of employees, messages, recipients, and reference information. For the purposes of
3 reproducibility, the names used to denote tables and fields are those used in the MySQL database Shetty and Abidi [4] provide for download on the associated web site. In total, the dataset contained messages (in the message table) stored in the folders of 151 Enron employees (employeelist table), with a total of distinct senders for all messages (including the 151 employees in the employeelist table). For the purposes of the SNA-based GA algorithm, a particular employee ID (eid) is select by a GA instruction to specify an employee from the employeelist table containing 151 employee entities. From the recipientinfo table, all messages (mid) for which the employee ( _id) address corresponding to eid was the recipient (rvalue) are retrieved, that is, messages are retrieved where rvalue = _id. Finally, from these s, a particular from the message table is specified by the GA (as an algorithm defined value of mid). The sender of this message (sender) is then retrieved from the message table to provide an employee, sender pair joined by an transaction. The selection of the relevant database entities specified by the GA is depicted in Figure 1. For faster retrieval, a hash table with recipient s ( _id) as keys, and a vector of the sender addresses (sender) for each message as values, is created. This hashtable is kept in memory during execution. The additional details of the interpretation of a GA instruction are provided in Section 5. IV. SOCIAL NETWORK ANALYSIS Three SNA metrics are used to identify networks of interest from the larger social network corresponding to the Enron corpus: degree, density, and proximity prestige. Degree and density are rooted in the identification of the importance of agents in an interacting network using the SNA concept of centrality. Centrality assumes that important actors usually occupy strategic locations in the network without relation to the direction of the transactions (edges). While directed edges are displayed in the network so end users are aware of direction of s sent, calculations for both these metrics (described in this section) are based on undirected number of messages sent in either direction between two accounts. Proximity prestige, in contrast, provides the study with a metric that involves the direction of edges in its calculation. A. Degree Degree is the average number of edges incident with each node in the graph [17]. For the purpose of AML, it is a measure of the overall transaction-based activity of the nodes in the network. While valued versions of this metric are possible, the non-valued version is used in these experiments to contrast the valued alternative of the density metric (explained immediately in the next section). The mean degree of the network is given by the equation d g i 1 dn ( ) g i where g is the number of nodes in the network, n is a node in the graph, and d(n i ) is the degree of a node (number of lines incident with that node). To provide a visualization of the concept of degree in a graph, hypothetical networks of high and low degree are shown in Figure 2. Fig. 2. Hypothetical networks of high and low degree. Fig. 1. Database entities used by the SNA GA system to create an employee recipient address, sender address pair connected by an message. Only table columns relevant to this analysis are shown in the diagram. B. Density Density is the proportion of possible lines that are actually present in the graph [17]. For a valued graph, as is the case here where number of s are construed as edges connecting the accounts construed as nodes, the density ( ) is measured as
4 v k gg ( 1) where v k are the values over all k for the values {v 1, v 2,, v L } attached to the set of lines (edges) L, and g is the number of nodes. The density metric was expected to perform well in detection interesting networks, as it measures the interconnectedness of a network while incorporating the number of transactions. Note that for a non-valued graph the value of the density will fall in the range [0, 1]; however, there is no such restriction on the valued version of the density metric. Hypothetical networks of high and low density are depicted in Figure 3. another is simply the length of the shortest path between them, where in a directed graph distances from n i to n j and n j to n i can be different. The length of the shortest path (the distance) from n i to n j is the first integer power p of the original sociomatrix (with p = 1) for which the (i,j) element is non-zero:, 0, p > 0 When actors who can reach actor i become closer, then the P p ratio becomes larger and actor i has greater prestige. P p has a maximum value of 1 when all actors are adjacent to n i, and a minimum value of 0 when n i is unreachable. A group level measure of proximity prestige is simply the average of actor proximities: A network with individual members of high and low prestige is shown in Figure 4 to provide a visualization of the concept. Fig. 3. Hypothetical networks of high and low density. C. Proximity Prestige The proximity prestige measure is used to determine what is known as the account s influence domain, where the influence domain of an actor is the set of actors who are both directly and indirectly linked to that actor. This set includes all actors that are reachable to i, where two nodes are reachable from one another if there exists a path between them. A path is a walk in which all nodes and lines are distinct, where a walk is a sequence of nodes and lines, starting and ending with nodes, where each node is incident with lines preceding and following it in the sequence. The influence domain for actor i consists of all actors whose entries in the ith column of a distance matrix are finite; the number of actors in the influence domain for an actor i is denoted I i. In practice, proximity prestige is used to measure closeness using distances to, rather than from, each actor in a directed graph. The proximity measure for the problem domain of this work, from [17], is the ratio of the proportion of actors who can reach i to the average distance of these actors from i. This proximity prestige measure is / 1, / where I i is the number of actors in the influence domain of node n i, g is the total number of nodes in the graph, and d(n j, n i ) is the distance that actor j is from actor i. Distance, or more precisely geodesic distance, between nodes n i and n j is found by using power matrices. Distance from one node to Fig. 4. A hypothetical network containing members of high and low proximity prestige. V. GENETIC ALGORITHM AND EXPERIMENTAL SETUP A. Individual Representation The individuals in the genetic algorithm consist of binary strings composed of 50 instructions, with each instruction consisting of 22 bits. The first 8 bits of an instruction correspond to the database entity that is to be queried, which is one of the employees in the employee list table. The binary number corresponding to the first 8 bits is interpreted as its integer equivalent, modulo the number of entities (primary keys) in the database (151). The largest number of s sent to an individual in the employee table is 9 052, which can be specified by 14 bits (2 14 = ). The following 14 bits thus map to an sent to the individual specified in the first portion of the instruction, modulo the number of s present for that entity. In particular, for the purpose of building the SNA network, the main sender of the received is of interest: all individuals CCed are not considered per individual instruction. (However, an individual that receives an in virture of being CCed can appear in an SNA graph by being specified as the receiver in the first segment of an instruction.) Each
5 instruction thus corresponds to a receiver, sender pair in the real world. The entire GA individual is a list of such pairs, which represents a network of possibly interacting accounts (nodes) connected by sent s (edges). The 50 instructions comprising an individual are converted to a sociomatrix to allow fitness evaluation by the GA in terms of the SNA metrics. A sociomatrix is a matrix indexed by the set of originating actors (rows) and the set of receiving actors (columns). The total unique s between the originating accounts and receiving accounts are the sociomatrix values for the ties between the actors. The actors in rows or columns can be any of the 151 employees (receivers) from the main employee table or any of the other total senders who ed messages to the 151 employees (for a total addresses). Once an individual sociomatrix of interest is found by the GA, it is displayed as its equivalent graph of nodes and edges. The set of actors (all sending or receiving accounts) are used as the nodes of the graph, and the set of messages is used as the collection of edges. Edges are weighted according to number of unique s sent from one actor to the other, where edge thickness grows in integer increments of one . The interpretations of a simplified, hypothetical GA individual are shown in Figure 5. to explore new account, pairings in the candidate networks. At the end of the tournament, the individual with the highest value for the SNA metric based on its sociomatrix is displayed to the user as a directed, cyclic graph with edges weighted to reflect the number of (s) sent between them. A similar random, greedy search was conducted where the best two of four individuals in a tournament round were kept and the two losing individuals were replaced with randomly generated candidate networks. Fifty trials were performed using an Apple IMac with an Intel Core 2 Duo CPU 2.8 GHz and 4.00 GB of RAM, running OS X Leopard version The solution was implemented using Java version and MySQL. The JAMA library [18] was used for the matrix manipulations required for the evaluation of social network-based fitness metrics. All visualizations were produced using a customized version of the Prefuse framework [19]. The GA parameters used are summarized in Table 1. TABLE I EXPERIMENTAL GENETIC ALGORITHM PARAMETERS Experimental Trials 50 Tournament Rounds 1500 per trial Instruction format 22 bits total (8 bits account, 14 bits transaction) Genotype structure 50 instructions per individual Mutation instruction-level XOR mask, threshold = 0.9 Fitness metric degree, density, proximity prestige Objective Find network that maximizes metric. VI. QUANTITATIVE SNA PERFORMANCE Fig. 5. Interpretation of a GA individual as bit string, sociomatrix, and network of senders and recipients. B. Algorithm Details Tournaments were steady state and consisted of 1500 rounds, which was determined to be of reasonable duration across all three chosen SNA metrics based on preliminary experiments. During each tournament round, four individuals were chosen to compete. In the competition, each binary string individual is interpreted by querying the database and producing a sociomatrix corresponding to transactions as described in the previous section. The top two individuals sociomatrices are declared winners, and the binary strings (genotype) of the last two individuals are replaced with the winning individuals genotype (becoming the children). Following selection, mutation was applied using an XOR mask on an individual instruction segment of the binary string. Mutation occurred with a threshold of 0.9, after choosing a particular instruction from the individual s instruction set using a uniform random distribution. Crossover was not performed, as retaining groups of related instructions as a sequence is not appropriate for this domain: In this representation, each instruction represents potentially unique s for particular accounts. Mutation serves It should be noted that the quantitative metrics used in these networks due not necessarily measure success per se. The actual benefit of the SNA-based GA search is actually qualitative in nature, where the aim of the search is to provide the end users with subnetworks of interest whereby highly connected or active accounts can be discovered (discussed in Section 7). The quantitative results provide a feel of the performance of the GA, SNA metric combinations in general, and confirm that GA search finds networks corresponding to higher SNA values than a random, greedy alternative (thus GA provides a means of intelligently searching the space). Greedy, exhaustive (as opposed to random) search is not a viable option due to the enormous size of the search space (see Section 2). The results shown in Figures 6, 7, and 8 indicate the best final degree, density, and proximity prestige measures at the end of each trial for fifty trials of 1500 generations comparing GA and random greedy search. In the boxplot figures, each box indicates the lower quartile, median, and upper quartile values. If the notches of two boxes do not overlap, the medians of the two groups differ at the 0.95 confidence interval. Points represent outliers to whiskers of 1.5 times the interquartile range. To accompany the boxplots, Table 2 provides precise measures for the GA and random greedy search algorithms.
6 TABLE II ALGORITHM MEAN, MINIMUM, AND MAXIMUM SNA METRICS Genetic Algorithm Greedy Search Mean Min Max Mean Min Max Degree Density Prestige Fig. 6. Boxplot of highest degree found by the GA at the end of 1500 rounds over 50 trials for a social network of up to 100 nodes. Examining Figures 6, 7, and 8 it is evident that the SNAbased GA search outperforms random, greedy search with very high statistical significance (no notches of the boxplots overlap or are even in close proximity). Examining Figure 6, we see that the degree measure indicates that, on average, approximately 1.19 to 1.61 edges (Table 2) are attached to each node. This means that the degree metric is finding sparse networks. Density in Figure 7 ranges from to a maximum (Table 2), indicating that the GA can choose networks that possess up to 9% of all possible connections among nodes. Even though density is valued, the proportions can roughly be construed as percentages when edges represent single s (as is the case for the 9% best network, Figure 11). In practical terms, this means that the density networks represent high volume and active communication among individuals. Given that proximity prestige, Figure 8, reflects what proportion of actors can reach a particular actor, 0.32 to 0.42 (Table 2) in the networks discovered by the GA indicate the likely presence of key actors. We now complement these quantitative observations with qualitative analysis of the networks. VII. QUALITATIVE ANALYSIS: NETWORKS A. Degree The network interpretation for the best individual (highest fitness level) for the degree metric is shown in Figure 9. The degree metric expresses the overall activity in the network. A high degree (non-valued in this work) network is composed of accounts that are involved in a large number of incoming or outgoing s to different individuals, independent of number of s sent between those individuals. The edges are weighted in Figure 9 as an indication of the number of s sent, but the degree metric is left as non-valued to provide greater contrast with the valued version of the density metric used (see Section 4). Fig. 7. Boxplot of highest density found by the GA at the end of 1500 rounds over 50 trials for a social network of up to 100 nodes. Fig. 8. Boxplot of highest proximity prestige found by the GA at the end of 1500 rounds over 50 trials for a social network of up to 100 nodes. Fig. 9. Highest degree network found by the GA. Thickness of edges reflects number of s sent.
7 While this network represents a good attempt at detecting employee accounts that are quite active, we can see that the metric only picks up one edge that involves more than one between two parties. While degree can pick up accounts with much activity, it is by chance that it will find individuals that are sending large numbers of s between one another. That is, the fact that thicker edges are detected in a non-valued degree-based network is a matter of chance. B. Density The network interpretation for the best individual (highest fitness level) is shown in Figure 10 for the density metric. This network represents a balance between the number of s sent among individuals and the overall activity of the accounts. ([email protected]), nodal degree 6 ([email protected]), and nodal degree 10 ([email protected]). Proximity prestige finds a network composed of high activity (individual) nodes, but it does not necessarily pick up large volume connections between individuals as occurs with the density metric (compare Figures 10 and 11). Proximity prestige appears to be the best metric for locating individuals of interest in terms of high degree (active ing behavior), even better than the degree metric (compare Figures 9 and 11). It is likely that this occurs because the presence of high degree individuals in a GA s candidate network is rewarded implicitly by the group level proximity prestige metric, whereas the group level degree metric rewards a network with high degree overall while not necessarily favoring networks where a few individuals have very high degree. Fig. 10. Highest density network found by the GA. Thickness of edges reflects net amount of transaction. Comparing Figures 9 and 10, it is readily evident that the density metric finds a network incorporating thinker edges. This corresponds to a network where it can be seen that particular employees are sending larger volumes of between one another. Since each GA individual / final network consists of a fixed number of transactions, the presence of the thicker edges for the density metric (Figure 10) results in a network of less nodes compared to degree (Figure 9). In both degree and density, the degree of an individual node ranges from 1 to 4, and there does not actually appear to be a great deal of difference in the average level of activity per node expressed in each of the networks. (That is, while degree simply has more nodes than density, the individual nodes are not more active in their activity.) C. Proximity Prestige The network interpretation for the highest proximity prestige group is shown in Figure 11. The network features a number of nodes that are connected by to a high number of other nodes with (incoming or outgoing) s (edges) compared to the density and degree networks. Whereas other networks have nodal degrees of up to 4, there are nodes in this network with nodal degee 5 Fig. 11. Highest proximity prestige network found by the GA. Thickness of edges reflects net amount of transaction. D. Key Actor Analysis In terms of relevant of individuals picked up by the network, the job titles associated with the addresses are provided as available from [4] and additional information about employees is otherwise cited. The density metric picks up a number of important actors, one being [email protected]. Grigsby was a manager at Enron, and McCallum et al. [9] in their topic analysis note that he was an active er in virtue of being involved in the sports pool. The density network also picks up two of Enron s directors, [email protected] and [email protected] and two Vice Presidents ([email protected] and [email protected]). In contrast, the degree network picks up less nodes of interest. No directors are present, but one vice president ([email protected]) is present in the density network. However, the presence of the vice president is only due to a single connection to a key actor
8 The proximity prestige network involves the presence of three key actors, plus an additional two traders and and a director ([email protected]). Proximity prestige is the only metric to pick up any traders in the best network, where this was an important role at Enron. The key players [email protected] (nodal degree 6) and [email protected] (nodal degree 5) are present in virtue of their participation in the sports pool. As mentioned, Mike Grigsby was a manager who was also prominent in the density network. Eric Bass seems to have been the coordinator of a fantasy football league for Enron employees [9]. The highest nodal degree of 10 in the proximity prestige network belongs to [email protected]. Her account is actually present in all three networks (Figures 9, 10, and 11). She is mentioned in the study of Frantz and Carley [12] as one of the top five key actors in their weekly snapshots. Although her official job title was not readily discernable, examination of her s reveals that she seems to have been active in the administration of credit worksheets involved in Enron s trading activities. VIII. CONCLUSION This work presents the analysis of three SNA-based metrics for the examination of the Enron dataset: degree, density, and proximity prestige. The quantitative results for each SNA metric were compared between a GA and random greedy search. It was found that GA definitively outperformed a random greedy search, meaning that GA is an appropriate application of computational intelligence to search the large Enron s dataset space. It was expected, based on quantitative analysis, that density and prestige metrics would provide the most useful networks compared to degree, and this was evidenced by qualitative analysis. The social networks corresponding to the highest measure for each SNA metric in the GA were then examined using a visualization tool. Particular SNA metic-based networks provided specialized information, sometimes to the detriment of other metrics. For instance, high measure of the group-level degree metric came at a cost of the absence of individual nodes with higher nodal degree or edges reflecting large numbers of s transferred. Also, while higher measures of the proximity prestige metric provided nodes of high nodal degree, there were also an absence of edges with multiple transfers. Overall, the density and proximity prestige SNA metrics were found be useful in isolating the most interesting accounts and their associated activities. Future work will examine in greater detail the relationship among the three metrics as GA evolution is directed by a particular metric. Also, consensus building using multiple SNA metrics will be explored to mitigate the trade-off costs of one metric over another in order to isolate networks of increased interest. REFERENCES [1] B. McLean and P. Elkind, The Smartest Guys in the Room: The Amazing Rise and Scandalous Fall of Enron. New York, USA: Portfolio, [2] R. Bryce, Pipe Dreams: Greed, Ego, and the Death of Enron. New York, USA: PublicAffairs, [3] B. Klimt and Y. Yang, "Introducing the Enron Corpus," in Fifth Conference on and Anti-Spam (CEAS) 2008, Mountain View, USA, [4] J. Shetty and J. Adibi, "The Enron Dataset Database Schema and Brief Statistical Report," in [5] M. Berry and M. Browne, " Surveillance Using Non-negative Matrix Factorization," Computational & Mathematical Organization Theory, vol. 11, no. 3, pp , [6] C. Priebe, J. Conroy, D. Marchette, and Y. Park, "Scan Statistics on Enron Graphs," Computational & Mathematical Organization Theory, vol. 11, no. 3, pp , [7] P. Keila and D. Skillicorn, "Structure in the Enron Dataset," Computational & Mathematical Organization Theory, vol. 11, no. 3, pp , [8] A. Chapanond, M. Krishnamoorthy, and B. l. Yener, "Graph Theoretic and Spectral Analysis of Enron Data," Computational & Mathematical Organization Theory, vol. 11, no. 3, pp , [9] A. McCallum, X. Wang, and A. Corrade-Emmanuel, "Topic and Role Discovery in Social Networks with Experiments on Enron and Academic ," Journal of Artificial Intelligence Research, no. 30, pp , [10] Y. Duan, J. Wang, M. Kam, and J. Canny, "A Secure Online Algorithm for Link Analysis on Weighted Graph," in Proceedings of the Workshop on Link Analysis, Counterterrorism and Security, SIAM International Conference on Data Mining 2005, Newport Beach, 2005, pp [11] J. Diesner, T. Frantz, and K. Carley, "Communication Networks from the Enron Corpus?It's Always About the People. Enron is no Different?," Computational & Mathematical Organization Theory, vol. 11, no. 3, pp , [12] T. Frantz and K. Carley.Dynamics of Organizational Chatter. presented at North American Association for Computational Social and Organization Sciences (NAACSOS) 2006.[PowerPoint]. Available: nizationchatter_presentation.pdf [13] W. B. Langdon, R. Poli, and W. Banzhaf, "An eigen analysis of the GP community," Genetic Programming and Evolvable Machines, pp. upcoming, [14] L. Luthi, M. Tomassini, and M. Giacobini, "The Genetic Programming Collaboration Network and its Communities," in Proceedings of GECCO 2007, London, UK, 2007, pp [15] J. Galloway and S. Simoff, "Network Data Mining: Discovering Patterns of Interaction Between Attributes," in Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD) 2006, Singapore, [16] C. Cortes, D. Pregibon, and C. Volinsky, "Computational Methods for Dynamic Graphs," Journal of Computational and Graphical Statistics, vol. 12, pp , [17] S. Wasserman and K. Faust, Social Network Analysis: Methods and Applications. New York, USA: Cambridge University Press, [18] J. Hicklin, C. Moler, P. Webb, R. Boisvert, B. Miller, R. Pozo, and K. Remington, "JAMA: A Java Matrix Package," in [19] J. Heer, S. K. Card, and J. Landay, "Prefuse: A Toolkit for Interactive Information Visualization," in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2005), Portland, USA, 2005, pp
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS INTRODUCTION
A SOCIAL NETWORK ANALYSIS APPROACH TO ANALYZE ROAD NETWORKS Kyoungjin Park Alper Yilmaz Photogrammetric and Computer Vision Lab Ohio State University [email protected] [email protected] ABSTRACT Depending
A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms
2009 International Conference on Adaptive and Intelligent Systems A Comparison of Genotype Representations to Acquire Stock Trading Strategy Using Genetic Algorithms Kazuhiro Matsui Dept. of Computer Science
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
How to Analyze Company Using Social Network?
How to Analyze Company Using Social Network? Sebastian Palus 1, Piotr Bródka 1, Przemysław Kazienko 1 1 Wroclaw University of Technology, Wyb. Wyspianskiego 27, 50-370 Wroclaw, Poland {sebastian.palus,
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
A Non-Linear Schema Theorem for Genetic Algorithms
A Non-Linear Schema Theorem for Genetic Algorithms William A Greene Computer Science Department University of New Orleans New Orleans, LA 70148 bill@csunoedu 504-280-6755 Abstract We generalize Holland
Week 3. Network Data; Introduction to Graph Theory and Sociometric Notation
Wasserman, Stanley, and Katherine Faust. 2009. Social Network Analysis: Methods and Applications, Structural Analysis in the Social Sciences. New York, NY: Cambridge University Press. Chapter III: Notation
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
Graph theoretic approach to analyze amino acid network
Int. J. Adv. Appl. Math. and Mech. 2(3) (2015) 31-37 (ISSN: 2347-2529) Journal homepage: www.ijaamm.com International Journal of Advances in Applied Mathematics and Mechanics Graph theoretic approach to
An Introduction to APGL
An Introduction to APGL Charanpal Dhanjal February 2012 Abstract Another Python Graph Library (APGL) is a graph library written using pure Python, NumPy and SciPy. Users new to the library can gain an
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE
HYBRID GENETIC ALGORITHMS FOR SCHEDULING ADVERTISEMENTS ON A WEB PAGE Subodha Kumar University of Washington [email protected] Varghese S. Jacob University of Texas at Dallas [email protected]
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve
Genetic Algorithms commonly used selection, replacement, and variation operators Fernando Lobo University of Algarve Outline Selection methods Replacement methods Variation operators Selection Methods
IBM SPSS Modeler Social Network Analysis 15 User Guide
IBM SPSS Modeler Social Network Analysis 15 User Guide Note: Before using this information and the product it supports, read the general information under Notices on p. 25. This edition applies to IBM
SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS
SOCIAL NETWORK ANALYSIS EVALUATING THE CUSTOMER S INFLUENCE FACTOR OVER BUSINESS EVENTS Carlos Andre Reis Pinheiro 1 and Markus Helfert 2 1 School of Computing, Dublin City University, Dublin, Ireland
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Alpha Cut based Novel Selection for Genetic Algorithm
Alpha Cut based Novel for Genetic Algorithm Rakesh Kumar Professor Girdhar Gopal Research Scholar Rajesh Kumar Assistant Professor ABSTRACT Genetic algorithm (GA) has several genetic operators that can
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Using Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
GA as a Data Optimization Tool for Predictive Analytics
GA as a Data Optimization Tool for Predictive Analytics Chandra.J 1, Dr.Nachamai.M 2,Dr.Anitha.S.Pillai 3 1Assistant Professor, Department of computer Science, Christ University, Bangalore,India, [email protected]
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Monitoring Corporate Password Sharing Using Social Network Analysis
Monitoring Corporate Password Sharing Using Social Network Analysis Andrew S. Patrick NRC Canada Paper to be presented at the International Sunbelt Social Network Conference, St. Pete Beach, Florida, January
Mining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
14.10.2014. Overview. Swarms in nature. Fish, birds, ants, termites, Introduction to swarm intelligence principles Particle Swarm Optimization (PSO)
Overview Kyrre Glette kyrrehg@ifi INF3490 Swarm Intelligence Particle Swarm Optimization Introduction to swarm intelligence principles Particle Swarm Optimization (PSO) 3 Swarms in nature Fish, birds,
How To Find Influence Between Two Concepts In A Network
2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing
Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset
Strategies for Cleaning Organizational Emails with an Application to Enron Email Dataset Yingjie Zhou [email protected] Mark Goldberg [email protected] Malik Magdon-Ismail [email protected] William A. Wallace
HISTORICAL DEVELOPMENTS AND THEORETICAL APPROACHES IN SOCIOLOGY Vol. I - Social Network Analysis - Wouter de Nooy
SOCIAL NETWORK ANALYSIS University of Amsterdam, Netherlands Keywords: Social networks, structuralism, cohesion, brokerage, stratification, network analysis, methods, graph theory, statistical models Contents
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey
A Robust Method for Solving Transcendental Equations
www.ijcsi.org 413 A Robust Method for Solving Transcendental Equations Md. Golam Moazzam, Amita Chakraborty and Md. Al-Amin Bhuiyan Department of Computer Science and Engineering, Jahangirnagar University,
Data Exploration Data Visualization
Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select
Management Science Letters
Management Science Letters 4 (2014) 905 912 Contents lists available at GrowingScience Management Science Letters homepage: www.growingscience.com/msl Measuring customer loyalty using an extended RFM and
Load balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
Asexual Versus Sexual Reproduction in Genetic Algorithms 1
Asexual Versus Sexual Reproduction in Genetic Algorithms Wendy Ann Deslauriers ([email protected]) Institute of Cognitive Science,Room 22, Dunton Tower Carleton University, 25 Colonel By Drive
Examining graduate committee faculty compositions- A social network analysis example. Kathryn Shirley and Kelly D. Bradley. University of Kentucky
Examining graduate committee faculty compositions- A social network analysis example Kathryn Shirley and Kelly D. Bradley University of Kentucky Graduate committee social network analysis 1 Abstract Social
A Brief Study of the Nurse Scheduling Problem (NSP)
A Brief Study of the Nurse Scheduling Problem (NSP) Lizzy Augustine, Morgan Faer, Andreas Kavountzis, Reema Patel Submitted Tuesday December 15, 2009 0. Introduction and Background Our interest in the
Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th
Vertical Alignment Colorado Academic Standards 6 th - 7 th - 8 th Standard 3: Data Analysis, Statistics, and Probability 6 th Prepared Graduates: 1. Solve problems and make decisions that depend on un
Content-Based Discovery of Twitter Influencers
Content-Based Discovery of Twitter Influencers Chiara Francalanci, Irma Metra Department of Electronics, Information and Bioengineering Polytechnic of Milan, Italy [email protected] [email protected]
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
MapReduce Approach to Collective Classification for Networks
MapReduce Approach to Collective Classification for Networks Wojciech Indyk 1, Tomasz Kajdanowicz 1, Przemyslaw Kazienko 1, and Slawomir Plamowski 1 Wroclaw University of Technology, Wroclaw, Poland Faculty
Diagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics*
IE 680 Special Topics in Production Systems: Networks, Routing and Logistics* Rakesh Nagi Department of Industrial Engineering University at Buffalo (SUNY) *Lecture notes from Network Flows by Ahuja, Magnanti
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
Project Knowledge Management Based on Social Networks
DOI: 10.7763/IPEDR. 2014. V70. 10 Project Knowledge Management Based on Social Networks Panos Fitsilis 1+, Vassilis Gerogiannis 1, and Leonidas Anthopoulos 1 1 Business Administration Dep., Technological
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS
USE OF EIGENVALUES AND EIGENVECTORS TO ANALYZE BIPARTIVITY OF NETWORK GRAPHS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected] ABSTRACT This
Chapter 14 Managing Operational Risks with Bayesian Networks
Chapter 14 Managing Operational Risks with Bayesian Networks Carol Alexander This chapter introduces Bayesian belief and decision networks as quantitative management tools for operational risks. Bayesian
A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs
A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs Michael Scherger Department of Computer Science Texas Christian University Email: [email protected] Zakir Hussain Syed
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
How To Find Local Affinity Patterns In Big Data
Detection of local affinity patterns in big data Andrea Marinoni, Paolo Gamba Department of Electronics, University of Pavia, Italy Abstract Mining information in Big Data requires to design a new class
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
Compact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment
A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,
Northumberland Knowledge
Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS
ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS Michael Affenzeller (a), Stephan M. Winkler (b), Stefan Forstenlechner (c), Gabriel Kronberger (d), Michael Kommenda (e), Stefan
OPRE 6201 : 2. Simplex Method
OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2
Approximation Algorithms
Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms
Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith
Policy Discussion Briefing January 27 Composite performance measures in the public sector Rowena Jacobs, Maria Goddard and Peter C. Smith Introduction It is rare to open a newspaper or read a government
ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL
Kardi Teknomo ANALYTIC HIERARCHY PROCESS (AHP) TUTORIAL Revoledu.com Table of Contents Analytic Hierarchy Process (AHP) Tutorial... 1 Multi Criteria Decision Making... 1 Cross Tabulation... 2 Evaluation
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
Numerical Research on Distributed Genetic Algorithm with Redundant
Numerical Research on Distributed Genetic Algorithm with Redundant Binary Number 1 Sayori Seto, 2 Akinori Kanasugi 1,2 Graduate School of Engineering, Tokyo Denki University, Japan [email protected],
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Optimised Realistic Test Input Generation
Optimised Realistic Test Input Generation Mustafa Bozkurt and Mark Harman {m.bozkurt,m.harman}@cs.ucl.ac.uk CREST Centre, Department of Computer Science, University College London. Malet Place, London
Identifying Market Price Levels using Differential Evolution
Identifying Market Price Levels using Differential Evolution Michael Mayo University of Waikato, Hamilton, New Zealand [email protected] WWW home page: http://www.cs.waikato.ac.nz/~mmayo/ Abstract. Evolutionary
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices
Proc. of Int. Conf. on Advances in Computer Science, AETACS Efficient Iceberg Query Evaluation for Structured Data using Bitmap Indices Ms.Archana G.Narawade a, Mrs.Vaishali Kolhe b a PG student, D.Y.Patil
Link Prediction in Social Networks
CS378 Data Mining Final Project Report Dustin Ho : dsh544 Eric Shrewsberry : eas2389 Link Prediction in Social Networks 1. Introduction Social networks are becoming increasingly more prevalent in the daily
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
Risk Management for IT Security: When Theory Meets Practice
Risk Management for IT Security: When Theory Meets Practice Anil Kumar Chorppath Technical University of Munich Munich, Germany Email: [email protected] Tansu Alpcan The University of Melbourne Melbourne,
Statistical and computational challenges in networks and cybersecurity
Statistical and computational challenges in networks and cybersecurity Hugh Chipman Acadia University June 12, 2015 Statistical and computational challenges in networks and cybersecurity May 4-8, 2015,
Mathematical Model Based Total Security System with Qualitative and Quantitative Data of Human
Int Jr of Mathematics Sciences & Applications Vol3, No1, January-June 2013 Copyright Mind Reader Publications ISSN No: 2230-9888 wwwjournalshubcom Mathematical Model Based Total Security System with Qualitative
Name Reference Resolution in Organizational Email Archives
Name Reference Resolution in Organizational Email Archives Christopher P. Diehl Lise Getoor Galileo Namata Abstract Online communications provide a rich resource for understanding social networks. Information
Predicting the Stock Market with News Articles
Predicting the Stock Market with News Articles Kari Lee and Ryan Timmons CS224N Final Project Introduction Stock market prediction is an area of extreme importance to an entire industry. Stock price is
Process Mining by Measuring Process Block Similarity
Process Mining by Measuring Process Block Similarity Joonsoo Bae, James Caverlee 2, Ling Liu 2, Bill Rouse 2, Hua Yan 2 Dept of Industrial & Sys Eng, Chonbuk National Univ, South Korea jsbae@chonbukackr
Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs
CSE599s: Extremal Combinatorics November 21, 2011 Lecture 15 An Arithmetic Circuit Lowerbound and Flows in Graphs Lecturer: Anup Rao 1 An Arithmetic Circuit Lower Bound An arithmetic circuit is just like
Evolutionary SAT Solver (ESS)
Ninth LACCEI Latin American and Caribbean Conference (LACCEI 2011), Engineering for a Smart Planet, Innovation, Information Technology and Computational Tools for Sustainable Development, August 3-5, 2011,
CHOOSING A COLLEGE. Teacher s Guide Getting Started. Nathan N. Alexander Charlotte, NC
Teacher s Guide Getting Started Nathan N. Alexander Charlotte, NC Purpose In this two-day lesson, students determine their best-matched college. They use decision-making strategies based on their preferences
Introducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
Social Networking Analytics
VU UNIVERSITY AMSTERDAM BMI PAPER Social Networking Analytics Abstract In recent years, the online community has moved a step further in connecting people. Social Networking was born to enable people to
A Fraud Detection Approach in Telecommunication using Cluster GA
A Fraud Detection Approach in Telecommunication using Cluster GA V.Umayaparvathi Dept of Computer Science, DDE, MKU Dr.K.Iyakutti CSIR Emeritus Scientist, School of Physics, MKU Abstract: In trend mobile
A comparative study of social network analysis tools
Membre de Membre de A comparative study of social network analysis tools David Combe, Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry International Workshop on Web Intelligence and Virtual Enterprises
