Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
|
|
|
- Meghan Malone
- 10 years ago
- Views:
Transcription
1 Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de Keywords: Networks, Graph theory, Measurements, Algorithm design and analysis Abstract With the rise of online social networks and other highly dynamic system, the need for the analysis of their structural properties has grown in the last years. While the recomputation of graph-theoretic metrics is feasible for investigating a small set of static system snapshots, this approach is unfit for the application in highly dynamic systems where we aim at frequent property s. Based on the concept of data streams, new algorithms have been developed that the computed properties based on changes instead of recomputing them regularly. While there exists a plethora of frameworks and libraries for the analysis of static networks, there is currently no framework for the graph-theoretic analysis and development of new algorithms for dynamic networks. In this paper, we discuss a set of requirements a framework must meet to implement the general workflow for analyzing dynamic networks. We then introduce the architecture of a first prototype for such a framework, the Dynamic Network Analyzer (DNA). 1. INTRODUCTION Many systems in the focus of different research areas can be modeled as graphs. Common examples include transportation, computer, social, and biological networks. The graphtheoretic analysis of their underlying topologies allows to gain insights into their properties. In the case of evolving networks, whose structure changes over time, it is crucial to observe the transition of key properties to understand their characteristics and dynamics. A prominent example of dynamic systems are social networks, whose development over time is frequently studied. Interesting graph-theoretic properties include the dynamics of community structures [11], evolution of paths lengths [7], and the development of user centrality and roles over time [5]. A good understanding of these properties is a prerequisite for the design of distributed online social networks and the analysis of communication flow and influence in social networks. Another example for dynamic systems is the field of transportation networks. Well-known problems in this area are the computation of dynamic shortest paths [6] as well as efficient time table queries for train schedules [14]. Solutions are directly applicable to personal navigation systems and route planning for freight-traffic. A common approach for analyzing dynamic systems is to represent the changes to the network as a stream of s. This data stream is modeled as an infinite sequence of events [1], where a random access to the s is not possible. Hence, specialized algorithms for computing graphtheoretic properties in this setting are required. They must the values under consideration based on the incoming s to the graph instead of re-computing them since this would require an enormous overhead [12]. Many algorithms for the computation of graph-theoretic properties in dynamic networks have been developed, including connectivity measures [9], clustering coefficients [12], and shortest paths [8]. They can be used to gain insights into the evolution of specific properties of the system under investigation. The performance and memory footprint of computing a specific property for a dynamic system highly depends on the input data, the chosen algorithm, and the data structures used for the graph representation. Many algorithms use graph representations in form of adjacency lists. Others require specific data structures like an adjacency matrix. Some algorithms even depend on the ordering of adjacency lists or the structure of the adjacency matrix. In all cases, the choice of data structures used to represent the graph topology has a big impact on the memory consumption and the runtime of algorithms. Therefore, the implementation of the used data structures must be interchangeable. This allows users to select a combination of data structures and algorithms that is suited best for a specific scenario. Thereby, the best trade-off between runtime and memory consumption can be used for every scenario. Most existing graph-theoretic tools and frameworks like Pajek [4], Pregel [10], or SNAP [2] are designed for the analysis of static network topologies only and cannot easily be adapted to analyze dynamic graphs. With Stinger [3], Bader et al. introduced a dynamic graph structure that allows for efficient and parallel s. While it is considered to be highly efficient, it only provides the underlying data structures and no framework for the computation of graph-theoretic properties on top. The Stanford Network Analysis Platform 1 by Leskovec is a library that also supplies data structures for dynamic graphs. In contrast to Stinger, it supports a large set of 1
2 dynamic graph metrics but also lacks a surrounding framework for the comparison of different algorithms. Also, exchanging the basic graph data structures is not intended and an analysis of their impact on an algorithm s performance therefore not possible. In this paper, we discuss the basics for building a framework for the graph-theoretic analysis of dynamic networks. From a general workflow, applicable to the stream-based analysis of dynamic systems, we deduce a set of requirements for such a framework. We propose a framework architecture that fulfills these requirements. Based on this architecture, we describe the components of the Dynamic Network Analyzer (DNA), a first shot at developing the desired framework. The remainder of this paper is structured as follows: We discuss preliminaries and introduce our terminology in Section 2.. We outline requirements for a framework for the graph-theoretic analysis of dynamic networks in Section 3.. In Section 4., we present the architecture and components of the Dynamic Network Analyzer (DNA), a first implementation of such a framework we deveoped. Finally, we summarize the paper in Section 5. and give an outlook on the next steps for the development of DNA. 2. PRELIMINARIES AND TERMINOLOGY In this Section, we present our terminology for metrics, values, graphs, and their components. Then, we discuss s that reflect the dynamic changes of the system under consideration and how they are presented during the analysis, either as a stream or in form of batches.. Based on a general workflow for the analysis of dynamic networks, we describe how s can be used to graph data structures and values computed by metrics s, values, and metric collections A metric is a quantitative measure of a graph-theoretic property. Its result can be a single value, a list of values, a distribution, or any other kind of data. Common examples are the average node degree, the local clustering coefficients, and the hop plot. In the remainder of this paper, we do not differentiate between data types and simply refer to them as values. Many metrics investigate similar properties of a graph. An example are triangle-based metrics like the clustering coefficient, transitivity, and local clustering coefficients. All three metrics require an algorithm to count the number of triangles in the whole network or connected to each node. Such closely related metrics can easily be computed together, in most cases without overhead compared to the computation of a single metric. Therefore, we assume that similar metrics are grouped together to reduce their computation complexities. We denote such a set of related metrics as metric collection Graphs and weights A graph G i = (V i,e i ) at time i consists of a set of nodes V i and a set of edges E i representing the connections between these nodes. Depending on the type of network represented by the graph, the edges can be directed or undirected. Loops can be allowed or prohibited as well as the existence of parallel edges. To carry further information, weights can be added to nodes and edges respectively, assigning a value c to each corresponding entity. Assume without loss of generality that c R. Then, weight functions for nodes and edges are defined as follows: w V i : N i R w E i : E i R As the graph changes over time, the weights can change as well, rendering the functions w N i and w E i dependent on the timestamp i Updates Each change that occurs in a dynamic system between time t = i and t = j,i < j is described as an u U i, j. To reflect all potential changes that may happen to a system modeled as a weighted graph, six different types have to be considered: Adding or removing a node (u V + = v / V i, u V = v V i ), adding or removing an edge (u E + = e / E i, u E = e V i ), and changing the weight of a node or edge (u V w = (v,c), u E w = (e,c),v V j, e E j,c R) Stream of s In general, the aforementioned s arrive in a continuous data stream. Especially when investigating the properties of a live systems as they happen, single s can arrive one at a time, potentially from different sources. Hence, one possibility to analyze dynamic systems is to consider the stream of s one at a time Batch of s In some situations, it may be preferable to apply the s in batches instead of importing each individually. It allows for the parallelization of processing and thereby can speed up the analysis. In addition, a batch U i, j can be pre-processed to remove redundant s like, e.g., the addition and removal of the same edge in a single batch. This potential performance improvement comes at the price of a decreased measurement granularity since changes are not considered one at a time. Also, it may increase the computation complexity since a parallel application of s can lead to dependencies between certain s, which has to be handled additionally. In the remainder of this paper, we only use the notion of batches. Applying a stream of s is basically the same as applying batches that contain only a single element. Note that
3 a stream can simply be transformed into batches by grouping s as they arrive until a specified batch size is reached or after a pre-defined time has passed Workflow for the application of s The s of each batch are used to adapt the graph data structure and the values of metrics. The overall procedure of the application of batches is shown in Figure 1. It Input Data initialize Graph G 1 compute Value v 1 Updates U 1,2 Updates U 2,3 Graph G 2 Graph G 3 Value v 2 Value v 3 t = 1 t = 2 t = 3 Figure 1: Workflow for analyzing dynamic networks also represents the general workflow applied for the graphtheoretic analysis of dynamic networks. At the beginning of an analysis (t = 1), the data structures for representing the graph are initialized based on some input data. This instance of the graph (G 1 ) is used as input for a metric to compute the target value v 1. At time t = 2, the batch U 1,2 with all changes that occurred since t = 1 is applied: The data structures are d by adding new nodes and edges, removing obsolete ones, and updating weights in the data structures from G 1, transforming the graph into G 2. In addition, the metric processes the s to adapt the target value according to the given changes. As input, this adaptation uses the graph G 2, the old value v 1, the batch U 1,2, and possibly auxiliary information that the metric maintains to enable the incremental s. For every new batch U i,i+1, the same procedure is executed, effectively transforming the graph G i into G i+1 and updating the value v i to v i+1 based on the batch U i,i+1, the graph G i+1, the value v i and auxiliary information maintained by the metric Data structure s The batch U i, j represents the set of all s that occurred between timestamps i and j, i.e., the union of all node/edge additions/removals and weight changes: U i, j := U V + i, j U V i, j U E + i, j U E i, j U V w i, j UE w i, j Applying this batch of s to the graph G i then yields the network s topology at timestamp j as follows: V j := (V i U V + i, j )\UV i, j E j := (E i U E + i, j )\(U E i, j (U V i, j V i ) (V i U V i, j )) Analogously, the weights of nodes and edges at timestamp j can be expressed as follows: w (v/e,w ) U E w/v w w j (v/e) := w V /E i (v/e) v/e V i /E i otherwise 2.8. s During the application of a batch (cf. Figure 1), the value of a metric is d based on the batch of changes, the graph, the old value, and auxiliary information stored by the metric. It is important to distinguish the time when the of a value is triggered in relation to the process of the graph structures. Some metrics perform these s based on the graph structure after the batch was applied, i.e., they take G j and U i, j, as for simplicity stated above. Others require the graph before the application of the s as input, i.e., G i and the batch U i, j. While processing in batches is beneficial in some cases, it might not be feasible in others. Therefore, there are metrics that require the application of each at a time and hence need to preprocess each at a time before or after it was applied to the graph. For some metrics, an incremental might not be feasible because the storage required for maintaining auxiliary information exceeds the available memory. It is also possible that no algorithm for an efficient batch application is known. Such metrics have to be re-computed for the whole graph after the application of each batch to the data structures. To map these different requirements, we distinguish five modes of metric s: 1. Before Batch (BB), i.e., input G i and U i, j 2. After Batch (AB), i.e., input G j and U i, j 3. Before Single (BS), i.e., input G k 1 i, j, U i, j, and u k U i, j 4. After Single (AS), i.e., input G k i, j, U i, j, and u k U i, j 5. Re-Computation (RC), i.e., input G j Here, G k i, j denotes the graph at timestamp i after the application of the first k s of the batch U i, j. Note that a metric can apply s of different modes in order to adapt a value. Take for example the computation of the largest connected component of a network. Well-known algorithms for updating this metric are based on single s, hence the BS or AS
4 mode would be used. In case of metrics that the value independently of each other, the BB or AB modes would be preferred since they allow for the parallelization of the s in the batch. A simple re-computation of the degree distribution would be performed in RC mode. 3. FRAMEWORK REQUIREMENTS In this Section, we give a set of requirements that a framework must meet in order to comply with the workflow shown in Figure 1 and allow for the analysis of various systems modeled using different types of graphs Supported graph types A framework for the graph-theoretic analysis of dynamic networks should support the basic and most commonly used graph types: directed and undirected graphs with or without loops as well as multigraphs. It should also be possible to add weights of arbitrary type to nodes and edges Implementation of new metrics A well-defined interface should allow the easy implementation of new metrics and approaches for their computation. It must include possibilities to specify actions for each mode of computation as described in Section This should enable the combination of multiple metrics in the analysis of a specific system Implementation of new data structures As already mentioned, the data structures used for representing and accessing the graph information have a high influence on the performance of computing the investigated metrics. Therefore, it is crucial for a framework to allow for the implementation of various data structures for storing the graph information for the different types of graphs Combination of metrics and data structures When creating a framework for the graph-theoretic analysis of dynamic networks, the main focus is to allow for the simple implementation and evaluation of various metrics as well as data structures for representing the system graph. Therefore, it should be easy to combine metrics with different data structures to find the desired trade-off between computation performance and memory requirements. Also, arbitrary graph topologies and combinations should be supported to allow for the evaluation of a diverse set of scenarios Classes of input data A framework should support three classes of input data for initializing the graph as well as creating s: online, offline, and generated. Online data describes a stream of data that arrives at the system in real time like a stream of tweets. From such input data, the initial graph topology as well as s can be modeled. This is necessary to support the analysis of live data streams and networks modeled from them using the framework. As offline data, we describe input data that is already completely available at the beginning of the analysis like e.g., consecutive snapshots from an online social network. After reading the initial graph from a file, s for the transition to the next snapshot are read and applied from files as well. As a pre-processing option, it should also be possible to create s from two given input graphs instead of providing them explicitly. The support of offline data allows the analysis of collected network snapshots and traces from already recorded systems. Generated data is not given in form of actual structural information but as a specification how the initial graph as well as batches of s should be generated. An example would be a graph generator that initializes a random graph with a given number of nodes and edges as well as a batch generator that creates random edge additions and removals. Such graph and batch generators allow for the test and evaluation of metrics and their implementation in well-understood network models and is therefore crucial in the development process of new algorithms for the computation of metrics Aggregation of multiple runs In order to achieve statistically significant results, a framework must support the execution and aggregation of multiple runs of the same experiment. Each of those runs should be reproducible Result visualization Lastly, the results of the values computed by the metrics as well as statistics about runtime and memory consumption of the respective setup should be visualized to quickly show the impact of different settings and investigate the properties of the analyzed system. 4. ARCHITECTURE AND COMPONENTS OF DNA In this Section, we describe the architecture and components of the Dynamic Network Analyzer (DNA), our first implementation of a Java-based framework for the graphtheoretic analysis of dynamic networks. While the development of DNA is still a work-in-progress, it already meets most of the requirements stated in Section 3.. The Architecture of the Dynamic Network Analyzer is shown in Figure 2. It provides interfaces and basic implementations for graphs, s, and batches. The initialization of graphs is provided by the graph generator component while the generator component supplied batches of s.
5 Graph generator initialize Graph Plotter Update generator compile Batch compute Values Statistics Figure 2: Architecture of the Dynamic Network Analyzer Series run 4.4. s The metric component supplies interfaces for the implementation of arbitrary metrics that comply with the aforementioned five modes of updating their values. This implies the support for metrics that apply any combination of the five value modes: BB, AB, BS, AS, and RC. In addition, the metric component supplies basic means to compare the values computed by different implementations of the same metric. This enables the verification of potentially complex -based implementations using simple re-computations. Also, values of an approximation can automatically be compared with their exactly computed counterpart. Please note that the metric component in DNA can combine the computation of multiple metrics in a single implementation. Thereby, it reflects the concept of metric collections as introduced in Section 2.. The metric components allow for the computation of values grouped into metric collections. The series components is a wrapper for generating multiple runs of the same scenario. For each of these runs, the computed and d values for each batch application are stored together with statistics about runtimes and memory requirements. These results are visualized using the plotter component Graphs and data structures As its basic component, DNA provides data structures for storing, updating, and querying graphs. So far, only weighted, directed graphs have been implemented. All data structures support the assignment of weights to nodes and edges. For simplicity, s for adding or removing nodes have been omitted in this initial version to simplify the process of graph data structures. Complementary to basic implementations of graphs, the graph component allows for the addition and use of further ones through a well-defined interface Graph generator The graph generator component provides the initialization of the graph. In its current implementation, it can read the graph from a file or generate it from scratch based on a specification Update generator and batches The generator provides the s of graph structure and weights to the system. Currently, it is only supported to read s from files or generate them from a specification. It can either pass down each separately as a stream or combine multiple s into batches of predefined size Series Runs have to be repeated multiple times and their results must be aggregated. The series component was developed to serve this purpose. Taking a graph generator, an generator, and a list of metrics as input, the series performs multiple runs of the same setup. The results and statistics for each batch application of every run are stored in a predefined folder structure. The procedure applied during the generation of a series is outlined in Algorithm 1. As input, it takes a graph generator Input: GraphGenerator gg, UpdateGenerator ug, m, int runs, int batches Output: Series data foreach r [1..runs]do Graph g = gg.generate(); graph initialization m.init(g); initial computation m.write(run.$r/batch.0/metric.0/) Runtimes.write(run.$r/batch.0/runtimes) Statistics.write(run.$r/batch.0/statistics) foreach b [1..batches]do Batch batch = ug.generate(); generate batch m.beforebatch(g, batch); BB mode foreach Update u : batch do m.beforeupdate(g, b, u); BS mode g.apply(u); application m.afterupdate(g, b, u); AS mode end m.afterbatch(g, batch); AB mode m.recomputation(g); RC mode m.write(run.$r/batch.$b/metric.0/) Runtimes.write(run.r/batch.b/runtimes) Statistics.write(run.r/batch.b/statistics) end end Aggregation.write(aggr/); Algorithm 1: Series generation
6 and an generator that supplies the initial graph as well as batches of s to it. In addition, the metric to be evaluated is given as well as the desired number of runs and batches to be executed. For brevity, only a single metric is given as input in this procedure. Normally, a list of metrics is given, all of which are computed in parallel during the execution. For each run, a new graph instance is initialized by the graph generator. The metric component computes its values for the initial graph which are written to the initial output folder, together with runtimes and statistics. Afterwards, the given number of batches is generated by the generator. Each batch is applied to the graph and the metric is d accordingly. In order to comply with the five modes for updating metric values described before, methods for updating the metric before, during, and after updating the graph structures are called. Please note, that it depends on the metric which methods are actually implemented and hence used to the computed values. Finally, the computed metric values are written to the filesystem together with runtimes and statistics. After each run is executed, the generated data is aggregated and written to a dedicated directory. The data of each run is stored in a separate folder (run1, run2,... ). For every batch, the computed values and statistics about its application are stored in a corresponding folder (batch.1, batch.2,... ). The values of the metric computations after the initialization of the graph are stored in batch.0. Inside of these batch folders, the values of each metric collection after performing the batch s s are stored together with runtimes and statistics Plotter The plotter component uses the data produced by a series to generate plots for visualizing its results. This includes plots of the metric values and their development over time, runtimes of metric and graph s, and other statistics computed during the application of each batch. In the current implementation, Gnuplot [13] is used to generate plots from the generated results and statistics. 5. SUMMARY In this paper, we outlined the basics for building a framework for the graph-theoretic analysis of dynamic networks. First, we described a general workflow for the analysis of dynamic networks in a scenario based on data streams as input for the changes in the system. Then, we listed a set of requirements a framework must to meet in order to comply with this workflow and allow for a straight-forward development of new algorithms and data structures and the analysis of arbitrary input data. Based on these requirements, we developed a first prototype of such a framework, the Dynamic Network Analyzer (DNA). So far, DNA already meets most of the requirements for the desired framework. While not all considered graph types and mechanisms are yet implemented, well-defined interfaces for data structures and metrics have already been developed. Different implementations of metrics and data structures can already be freely combined. Currently, only offline and generated input data is supported, leaving the integration of online data streams as future work. The series component already fulfills the basic requirements regarding the execution and aggregation of multiple runs. A global integration of seed-based pseudo-random behavior of generators ensures the reproducibility of experiments. The visualization of results is covered by the plotter component. Currently, we are extending DNA s data structures to allow for the addition and removal of nodes as part of the process. We are also working on the integration of online data sources to the graph and generator components. Also, the pre-processing of multiple graph snapshots into initial graph and batches of s is an ongoing task. Furthermore, we are considering the integration of interfaces to graph databases as a data structure into DNA. In addition, we are investigating possibilities to add general capabilities and interfaces for the parallelization of metric computations as present in many frameworks and libraries for the analysis of static network topologies. *Bibliography [1] Babcock et al. Models and issues in data stream systems. In ACM SIGMOD-SIGACT, [2] Bader et al. Snap, small-world network analysis and partitioning: an open-source parallel graph framework for the exploration of large-scale networks. In Parallel and Distributed Processing, [3] Bader et al. Stinger: Spatio-temporal interaction networks and graphs (sting) extensible representation. Technical Report, [4] Batagelj et al. Pajek-program for large network analysis. Connections, [5] Braha et al. Time-dependent complex networks: Dynamic centrality, dynamic motifs, and cycles of social interactions. In Adaptive Networks [6] Chabini. Discrete dynamic shortest path problems in transportation applications: Complexity and algorithms with optimal run time. Journal of the Transportation Research Board, [7] Kossinets et al. Empirical analysis of an evolving social network. Science, [8] Likhachev et al. Anytime dynamic a*: An anytime, replanning algorithm. In International conference on automated planning and scheduling (ICAPS), [9] Madduri et al. Compact graph representations and parallel connectivity algorithms for massive dynamic network analysis. In Parallel & Distributed Processing, [10] Malewicz et al. Pregel: a system for large-scale graph processing. In ACM SIGMOD International Conference on Management of data, [11] Mucha et al. Community structure in time-dependent, multiscale, and multiplex networks. Science, [12] Edigerand others. Massive streaming data analytics: A case study with clustering coefficients. In Parallel & Distributed Processing, [13] Racine. gnuplot 4.0: a portable interactive plotting utility. Journal of Applied Econometrics, [14] Stølting et al. Time-dependent networks as models to achieve fast exact time-table queries. Electronic Notes in Theoretical Computer Science, 2004.
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Distance Degree Sequences for Network Analysis
Universität Konstanz Computer & Information Science Algorithmics Group 15 Mar 2005 based on Palmer, Gibbons, and Faloutsos: ANF A Fast and Scalable Tool for Data Mining in Massive Graphs, SIGKDD 02. Motivation
Component visualization methods for large legacy software in C/C++
Annales Mathematicae et Informaticae 44 (2015) pp. 23 33 http://ami.ektf.hu Component visualization methods for large legacy software in C/C++ Máté Cserép a, Dániel Krupp b a Eötvös Loránd University [email protected]
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1
A Platform for Supporting Data Analytics on Twitter: Challenges and Objectives 1 Yannis Stavrakas Vassilis Plachouras IMIS / RC ATHENA Athens, Greece {yannis, vplachouras}@imis.athena-innovation.gr Abstract.
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Fault Analysis in Software with the Data Interaction of Classes
, pp.189-196 http://dx.doi.org/10.14257/ijsia.2015.9.9.17 Fault Analysis in Software with the Data Interaction of Classes Yan Xiaobo 1 and Wang Yichen 2 1 Science & Technology on Reliability & Environmental
EFFICIENT DETECTION IN DDOS ATTACK FOR TOPOLOGY GRAPH DEPENDENT PERFORMANCE IN PPM LARGE SCALE IPTRACEBACK
EFFICIENT DETECTION IN DDOS ATTACK FOR TOPOLOGY GRAPH DEPENDENT PERFORMANCE IN PPM LARGE SCALE IPTRACEBACK S.Abarna 1, R.Padmapriya 2 1 Mphil Scholar, 2 Assistant Professor, Department of Computer Science,
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
Copyright. Network and Protocol Simulation. What is simulation? What is simulation? What is simulation? What is simulation?
Copyright Network and Protocol Simulation Michela Meo Maurizio M. Munafò [email protected] [email protected] Quest opera è protetta dalla licenza Creative Commons NoDerivs-NonCommercial. Per
Test Coverage Criteria for Autonomous Mobile Systems based on Coloured Petri Nets
9th Symposium on Formal Methods for Automation and Safety in Railway and Automotive Systems Institut für Verkehrssicherheit und Automatisierungstechnik, TU Braunschweig, 2012 FORMS/FORMAT 2012 (http://www.forms-format.de)
Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network
, pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
STINGER: High Performance Data Structure for Streaming Graphs
STINGER: High Performance Data Structure for Streaming Graphs David Ediger Rob McColl Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA, USA Abstract The current research focus on
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
Performance of networks containing both MaxNet and SumNet links
Performance of networks containing both MaxNet and SumNet links Lachlan L. H. Andrew and Bartek P. Wydrowski Abstract Both MaxNet and SumNet are distributed congestion control architectures suitable for
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
A Multi-layered Domain-specific Language for Stencil Computations
A Multi-layered Domain-specific Language for Stencil Computations Christian Schmitt, Frank Hannig, Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Workshop ExaStencils 2014,
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Research Statement Immanuel Trummer www.itrummer.org
Research Statement Immanuel Trummer www.itrummer.org We are collecting data at unprecedented rates. This data contains valuable insights, but we need complex analytics to extract them. My research focuses
This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.
Big Data Processing 2013-2014 Q2 April 7, 2014 (Resit) Lecturer: Claudia Hauff Time Limit: 180 Minutes Name: Answer the questions in the spaces provided on this exam. If you run out of room for an answer,
2 Associating Facts with Time
TEMPORAL DATABASES Richard Thomas Snodgrass A temporal database (see Temporal Database) contains time-varying data. Time is an important aspect of all real-world phenomena. Events occur at specific points
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc])
305 REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN COMPUTER SCIENCE (MSc[CompSc]) (See also General Regulations) Any publication based on work approved for a higher degree should contain a reference
The Role of Automation Systems in Management of Change
The Role of Automation Systems in Management of Change Similar to changing lanes in an automobile in a winter storm, with change enters risk. Everyone has most likely experienced that feeling of changing
Open Source Software Developer and Project Networks
Open Source Software Developer and Project Networks Matthew Van Antwerp and Greg Madey University of Notre Dame {mvanantw,gmadey}@cse.nd.edu Abstract. This paper outlines complex network concepts and how
Routing in packet-switching networks
Routing in packet-switching networks Circuit switching vs. Packet switching Most of WANs based on circuit or packet switching Circuit switching designed for voice Resources dedicated to a particular call
White Paper. How Streaming Data Analytics Enables Real-Time Decisions
White Paper How Streaming Data Analytics Enables Real-Time Decisions Contents Introduction... 1 What Is Streaming Analytics?... 1 How Does SAS Event Stream Processing Work?... 2 Overview...2 Event Stream
Competitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data
QoSIP: A QoS Aware IP Routing Protocol for Multimedia Data Md. Golam Shagadul Amin Talukder and Al-Mukaddim Khan Pathan* Department of Computer Science and Engineering, Metropolitan University, Sylhet,
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Optimization and analysis of large scale data sorting algorithm based on Hadoop
Optimization and analysis of large scale sorting algorithm based on Hadoop Zhuo Wang, Longlong Tian, Dianjie Guo, Xiaoming Jiang Institute of Information Engineering, Chinese Academy of Sciences {wangzhuo,
Evaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist [email protected], [email protected], [email protected] Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
Report on the Dagstuhl Seminar Data Quality on the Web
Report on the Dagstuhl Seminar Data Quality on the Web Michael Gertz M. Tamer Özsu Gunter Saake Kai-Uwe Sattler U of California at Davis, U.S.A. U of Waterloo, Canada U of Magdeburg, Germany TU Ilmenau,
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
2) Write in detail the issues in the design of code generator.
COMPUTER SCIENCE AND ENGINEERING VI SEM CSE Principles of Compiler Design Unit-IV Question and answers UNIT IV CODE GENERATION 9 Issues in the design of code generator The target machine Runtime Storage
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Six Degrees of Separation in Online Society
Six Degrees of Separation in Online Society Lei Zhang * Tsinghua-Southampton Joint Lab on Web Science Graduate School in Shenzhen, Tsinghua University Shenzhen, Guangdong Province, P.R.China [email protected]
SOFTWARE TESTING TRAINING COURSES CONTENTS
SOFTWARE TESTING TRAINING COURSES CONTENTS 1 Unit I Description Objectves Duration Contents Software Testing Fundamentals and Best Practices This training course will give basic understanding on software
Cost Model: Work, Span and Parallelism. 1 The RAM model for sequential computation:
CSE341T 08/31/2015 Lecture 3 Cost Model: Work, Span and Parallelism In this lecture, we will look at how one analyze a parallel program written using Cilk Plus. When we analyze the cost of an algorithm
Cloud Enabled Emergency Navigation Using Faster-than-real-time Simulation
Cloud Enabled Emergency Navigation Using Faster-than-real-time Simulation Huibo Bi and Erol Gelenbe Intelligent Systems and Networks Group Department of Electrical and Electronic Engineering Imperial College
Complex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30
Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
Recent Advances in Periscope for Performance Analysis and Tuning
Recent Advances in Periscope for Performance Analysis and Tuning Isaias Compres, Michael Firbach, Michael Gerndt Robert Mijakovic, Yury Oleynik, Ventsislav Petkov Technische Universität München Yury Oleynik,
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
Big Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
How To Provide Qos Based Routing In The Internet
CHAPTER 2 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 22 QoS ROUTING AND ITS ROLE IN QOS PARADIGM 2.1 INTRODUCTION As the main emphasis of the present research work is on achieving QoS in routing, hence this
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Scaling 10Gb/s Clustering at Wire-Speed
Scaling 10Gb/s Clustering at Wire-Speed InfiniBand offers cost-effective wire-speed scaling with deterministic performance Mellanox Technologies Inc. 2900 Stender Way, Santa Clara, CA 95054 Tel: 408-970-3400
Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications
GEDAE TM - A Graphical Programming and Autocode Generation Tool for Signal Processor Applications Harris Z. Zebrowitz Lockheed Martin Advanced Technology Laboratories 1 Federal Street Camden, NJ 08102
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Beyond the Stars: Revisiting Virtual Cluster Embeddings
Beyond the Stars: Revisiting Virtual Cluster Embeddings Matthias Rost Technische Universität Berlin September 7th, 2015, Télécom-ParisTech Joint work with Carlo Fuerst, Stefan Schmid Published in ACM SIGCOMM
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION Exploration is a process of discovery. In the database exploration process, an analyst executes a sequence of transformations over a collection of data structures to discover useful
Optimization of ETL Work Flow in Data Warehouse
Optimization of ETL Work Flow in Data Warehouse Kommineni Sivaganesh M.Tech Student, CSE Department, Anil Neerukonda Institute of Technology & Science Visakhapatnam, India. [email protected] P Srinivasu
Tool Support for Inspecting the Code Quality of HPC Applications
Tool Support for Inspecting the Code Quality of HPC Applications Thomas Panas Dan Quinlan Richard Vuduc Center for Applied Scientific Computing Lawrence Livermore National Laboratory P.O. Box 808, L-550
Six Strategies for Building High Performance SOA Applications
Six Strategies for Building High Performance SOA Applications Uwe Breitenbücher, Oliver Kopp, Frank Leymann, Michael Reiter, Dieter Roller, and Tobias Unger University of Stuttgart, Institute of Architecture
Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
Development and Runtime Platform and High-speed Processing Technology for Data Utilization
Development and Runtime Platform and High-speed Processing Technology for Data Utilization Hidetoshi Kurihara Haruyasu Ueda Yoshinori Sakamoto Masazumi Matsubara Dramatic increases in computing power and
Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Big Data looks Tiny from the Stratosphere
Volker Markl http://www.user.tu-berlin.de/marklv [email protected] Big Data looks Tiny from the Stratosphere Data and analyses are becoming increasingly complex! Size Freshness Format/Media Type
A Framework for the Delivery of Personalized Adaptive Content
A Framework for the Delivery of Personalized Adaptive Content Colm Howlin CCKF Limited Dublin, Ireland [email protected] Danny Lynch CCKF Limited Dublin, Ireland [email protected] Abstract
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop. September 2014. National Institute of Standards and Technology (NIST)
NIST/ITL CSD Biometric Conformance Test Software on Apache Hadoop September 2014 Dylan Yaga NIST/ITL CSD Lead Software Designer Fernando Podio NIST/ITL CSD Project Manager National Institute of Standards
TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization
TeCReVis: A Tool for Test Coverage and Test Redundancy Visualization Negar Koochakzadeh Vahid Garousi Software Quality Engineering Research Group University of Calgary, Canada Acknowledging funding and
SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE
SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH NATIONAL UNIVERSITY OF SINGAPORE 2012 SPANNING CACTI FOR STRUCTURALLY CONTROLLABLE NETWORKS NGO THI TU ANH (M.Sc., SFU, Russia) A THESIS
ModelingandSimulationofthe OpenSourceSoftware Community
ModelingandSimulationofthe OpenSourceSoftware Community Yongqin Gao, GregMadey Departmentof ComputerScience and Engineering University ofnotre Dame ygao,[email protected] Vince Freeh Department of ComputerScience
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
A New Fault Tolerant Routing Algorithm For GMPLS/MPLS Networks
A New Fault Tolerant Routing Algorithm For GMPLS/MPLS Networks Mohammad HossienYaghmae Computer Department, Faculty of Engineering, Ferdowsi University of Mashhad, Mashhad, Iran [email protected]
UniGR Workshop: Big Data «The challenge of visualizing big data»
Dept. ISC Informatics, Systems & Collaboration UniGR Workshop: Big Data «The challenge of visualizing big data» Dr Ir Benoît Otjacques Deputy Scientific Director ISC The Future is Data-based Can we help?
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach
Parallel Ray Tracing using MPI: A Dynamic Load-balancing Approach S. M. Ashraful Kadir 1 and Tazrian Khan 2 1 Scientific Computing, Royal Institute of Technology (KTH), Stockholm, Sweden [email protected],
GigaSpaces Real-Time Analytics for Big Data
GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Graph Theory and Complex Networks: An Introduction. Chapter 06: Network analysis
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.0, [email protected] Chapter 06: Network analysis Version: April 8, 04 / 3 Contents Chapter
Internet Firewall CSIS 4222. Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS 4222. net15 1. Routers can implement packet filtering
Internet Firewall CSIS 4222 A combination of hardware and software that isolates an organization s internal network from the Internet at large Ch 27: Internet Routing Ch 30: Packet filtering & firewalls
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
BW-EML SAP Standard Application Benchmark
BW-EML SAP Standard Application Benchmark Heiko Gerwens and Tobias Kutning (&) SAP SE, Walldorf, Germany [email protected] Abstract. The focus of this presentation is on the latest addition to the
Distributed Dynamic Load Balancing for Iterative-Stencil Applications
Distributed Dynamic Load Balancing for Iterative-Stencil Applications G. Dethier 1, P. Marchot 2 and P.A. de Marneffe 1 1 EECS Department, University of Liege, Belgium 2 Chemical Engineering Department,
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
Microblogging Queries on Graph Databases: An Introspection
Microblogging Queries on Graph Databases: An Introspection ABSTRACT Oshini Goonetilleke RMIT University, Australia [email protected] Timos Sellis RMIT University, Australia [email protected]
Stability of QOS. Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu
Stability of QOS Avinash Varadarajan, Subhransu Maji {avinash,smaji}@cs.berkeley.edu Abstract Given a choice between two services, rest of the things being equal, it is natural to prefer the one with more
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
Mining a Change-Based Software Repository
Mining a Change-Based Software Repository Romain Robbes Faculty of Informatics University of Lugano, Switzerland 1 Introduction The nature of information found in software repositories determines what
How to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
BSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China [email protected],
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc.
A Hybrid Modeling Platform to meet Basel II Requirements in Banking Jeffery Morrision, SunTrust Bank, Inc. Introduction: The Basel Capital Accord, ready for implementation in force around 2006, sets out
Violin: A Framework for Extensible Block-level Storage
Violin: A Framework for Extensible Block-level Storage Michail Flouris Dept. of Computer Science, University of Toronto, Canada [email protected] Angelos Bilas ICS-FORTH & University of Crete, Greece
Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole
Paper BB-01 Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole ABSTRACT Stephen Overton, Overton Technologies, LLC, Raleigh, NC Business information can be consumed many
Information Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])
Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors
Journal of omputational Information Systems 7:2 (2011) 623-630 Available at http://www.jofcis.com Fault-Tolerant Routing Algorithm for BSN-Hypercube Using Unsafety Vectors Wenhong WEI 1,, Yong LI 2 1 School
