Massive-Scale Streaming Analytics. David A. Bader
|
|
|
- Esmond Floyd
- 10 years ago
- Views:
Transcription
1 Massive-Scale Streaming Analytics David A. Bader
2 Dr. David A. Bader Full Professor, Computational Science and Engineering Executive Director for High Performance Computing. IEEE Fellow, AAAS Fellow interests are at the intersection of high-performance computing and realworld applications, including computational biology and genomics and massive-scale data analytics. Over $165M of research awards Steering Committees of the major HPC conferences, IPDPS and HiPC Multiple editorial boards in parallel and high performance computing EIC of IEEE Transactions on Parallel and Distributed Systems Elected chair of IEEE and SIAM committees on HPC 230+ publications, 4,300 citations, h-index 38 National Science Foundation CAREER Award recipient Directed the Sony-Toshiba-IBM Center for the Cell/B.E. Processor Founder of the Graph500 List for benchmarking Big Data computing platforms Recognized as a RockStar of High Performance Computing by InsideHPC in 2012 and as HPCwire's People to Watch in 2012 and David A. Bader 2
3 Outline Overview of Georgia Tech STINGER: Streaming Analytics Case study 1: Seed Set Expansion Case study 2: Hybrid Betweenness Centrality Future architectures Conclusions David A. Bader 3
4 School of Computational Science & Engineering Modern science = theory + experimentation + computation. We solve real-world challenges through advanced computational techniques & data analysis. DAVID A. BADER 4
5 Core Research Areas High Performance Computing Big Data Machine Learning Analytics & Visualization Cybersecurity DAVID A. BADER 5
6 Big Data Analytics: The Challenge A Tsunami of Data Internet + sensors, cameras, wireless networks, etc. Limited ability to extract knowledge and insight Need algorithms that scale to massive, complex data sets Every minute of every day 48hrs of video uploaded to YouTube 100,000+ Tweets 3,600 Instagram photos 47,000 Apple App Store downloads 571 new websites created $272,070 spent shopping online DAVID A. BADER 6
7 Big Data Analytics: The Payoff Applications in: Epidemiology Astrophysics Bioinformatics Social Networks Homeland Security Text Analysis Biometric Recognition DAVID A. BADER 7
8 School of CSE: Just the Facts Founded: 2005 Chair: David Bader Faculty: 12 tenure track 4 joint 10 adjunct 10 research Administrative Staff: 5 Research Expenditures: $8.8 million (FY 2013) DAVID A. BADER 8
9 CRUISE: Computing Research Undergraduate Intern Summer Experience Encourage students to consider graduate studies Diverse student participation Ten week summer research projects CRUISE-wide events DAVID A. BADER 9
10 Graduate Education MS Degrees MS Computational Science & Engineering MS Computer Science MS Data Analytics new! PhD Degrees PhD Computational Science & Engineering PhD Computer Science DAVID A. BADER 10
11 Award-Winning Faculty 12 tenure-track faculty members 1 Regents professor 4 NSF CAREER awards 2 IEEE and 2 AAAS fellows, 1 SIAM fellow 4 joint, 10 adjunct, and 10 research faculty DAVID A. BADER 11
12 Outline Overview of Georgia Tech STINGER: Streaming Analytics Case study 1: Seed Set Expansion Case study 2: Hybrid Betweenness Centrality Future architectures Conclusions David A. Bader 12
13 STING Initiative: Focusing on Globally Significant Grand Challenges Many globally-significant grand challenges can be modeled by Spatio- Temporal Interaction Networks and Graphs (or STING ). Emerging real-world graph problems include detecting community structure in large social networks, defending the nation against cyber-based attacks, discovering insider threats (e.g. Ft. Hood shooter, WikiLeaks), improving the resilience of the electric power grid, and detecting and preventing disease in human populations. Unlike traditional applications in computational science and engineering, solving these problems at scale often raises new research challenges because of sparsity and the lack of locality in the massive data, design of parallel algorithms for massive, streaming data analytics, and the need for new exascale supercomputers that are energy-efficient, resilient, and easy-to-program. David A. Bader 13
14 Big Data problems need Graph Analysis Health Care Finding outbreaks, population epidemiology Social Networks Advertising, searching, grouping, influence Intelligence Decisions at scale, regulating algorithms Systems Biology Understanding interactions, drug design Power Grid Disruptions, conversion Simulation Discrete events, cracking meshes Graphs are a unifying motif for data analysis. Changing and dynamic graphs are important! David A. Bader 14
15 Data rates and volumes are immense Facebook: ~1 billion users average 130 friends 30 billion pieces of content shared / month Twitter: 500 million active users 340 million tweets / day Internet 100s of exabytes / year 300 million new websites per year 48 hours of video to You Tube per minute 30,000 YouTube videos played per second David A. Bader 15
16 Massive-Scale Streaming Analytics Historically, HPC uses batch processing style where a program and a static data set are scheduled to compute in the next available slot. Today, data is overwhelming in volume and rate, and we struggle to keep up with these streams. Fundamental computer science research is needed in: the design of streaming architectures, and data structures and algorithms that can compute important analytics while sitting in the middle of these torrential flows. David A. Bader 16
17 Our focus is streaming graphs Analysts (A, B, t1, poke) (A, C, t2, msg) (A, D, t3, view wall) (A, D, t4, post) (B, A, t2, poke) (B, A, t3, view wall) (B, A, t4, msg) Change detection Flows, Clustering, Centrality Structural change Key actors, anomalies David A. Bader 17
18 Graph Analytics for Social Networks Are there new graph techniques? Do they parallelize? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. Communities may overlap, exhibit different properties and sizes, and be driven by different models Detect communities (static or emerging) Identify important individuals Detect anomalous behavior Given a community, find a representative member of the community Given a set of individuals, find the best community that includes them David A. Bader 18
19 Streaming Queries How are individual graph metrics (e.g. clustering coefficients) changing? What are the patterns in the changes? Are there seasonal variations? What are responses to events? What are temporal anomalies in the graph? Do key members in clusters/communities change? Are there indicators of event responses before they are obvious? David A. Bader 19
20 Additional Applications Cybersecurity Determine if new packets are allowed or represent new threat in < 5ms... Is the transfer a virus? Illicit? Credit fraud forensics detection monitoring Integrate all the customer s data Becoming closer to real-time, massive scale Bioinformatics Construct gene sequences, analyze protein interactions, map brain interactions Amount of new data arriving is growing massively Power network planning, monitoring, and re-routing Already nation-scale problem As more power sources come online (rooftop solar)... David A. Bader 20
21 Analyzing Twitter for Social Good ICPP 2010 Image credit: bioethicsinstitute.org David A. Bader 21
22 STINGER: as an analysis package Streaming edge insertions and deletions: Performs new edge insertions, updates, and deletions in batches or individually. Streaming clustering coefficients: Tracks the local and global clustering coefficients of a graph under both edge insertions and deletions. Streaming connected components: Accurately tracks the connected components of a graph with insertions and deletions. Streaming community detection: Track and update the community structures within the graph as they change. Parallel agglomerative clustering: Find clusters that are optimized for a user-defined edge scoring function. Streaming Betweenness Centrality: Find the key points within information flows and structural vulnerabilities. K-core Extraction: Extract additional communities and filter noisy high-degree vertices. Classic breadth-first search: Performs a parallel breadth-first search of the graph starting at a given source vertex to find shortest paths. Optimized to update at rates of over 3 million edges per second on graphs of one billion edges David A. Bader 22
23 STINGER: Where do you get it? Gateway to code, development, documentation, presentations... Users / contributors / questioners: Georgia Tech, PNNL, CMU, Berkeley, Intel, Cray, NVIDIA, IBM, Federal Government, Ionic Security, Citi David A. Bader 23
24 STING Extensible Representation (STINGER) Enhanced representation developed for dynamic graphs developed in consultation with David A. Bader, Jon Berry, Adam Amos-Binks, Daniel Chavarría-Miranda, Charles Hastings, Kamesh Madduri, and Steven C. Poulos. Design goals: Be useful for the entire large graph community Portable semantics and high-level optimizations across multiple platforms & frameworks (XMT C, MTGL, etc.) Permit good performance: No single structure is optimal for all. Assume globally addressable memory access Support multiple, parallel readers and a single writer Operations: Insert/update & delete both vertices & edges Aging-off: Remove old edges (by timestamp) Serialization to support checkpointing, etc. David A. Bader 24
25 STING Extensible Representation Semi-dense edge list blocks with free space Compactly stores timestamps, types, weights Maps from application IDs to storage IDs Deletion by negating IDs, separate compaction David A. Bader 25
26 Massive Streaming Data Analytics Accumulate as much of the recent graph data as possible in main memory. Insertions / Deletions Pre-process, Sort, Reconcile Age off old vertices Affected vertices Alter graph Update metrics STINGER graph Change detection David A. Bader 26
27 STING: High-level architecture Server: Graph storage, kernel orchestration OpenMP + sufficiently POSIX-ish Multiple processes for resilience David A. Bader 27
28 Why not existing technologies? Traditional SQL databases Not structured to do any meaningful graph queries with any level of efficiency or timeliness Graph databases - mostly on-disk Distributed disk can keep up with storing / indexing, but is simply too slow at random graph access to process on as the graph updates Hadoop and HDFS-based projects Not really the right programming model for many structural queries over the entire graph, random access performance is poor Smaller graph libraries, processing tools Can't scale, can't process dynamic graphs, frequently leads to impossible visualization attempts David A. Bader 28
29 STINGER Streaming Graph Results Triangle counting / clustering coefficients Up to 130k graph updates per second on X5570 (Nehalem-EP, 2.93GHz) Connected components & spanning forest Over 88k graph updates per second on X5570 Community detection & maintenance Up to 100 million updates per second, 4-socket 40-core Westmere-EX (Note: Most updates do not change communities...) Incremental PageRank Reduce lower latency by > 2 over restarting Betweenness centrality O( V ( V + E )), can be sampled Speed-ups of over static recomputation David A. Bader 29
30 Outline Overview of Georgia Tech STINGER: Streaming Analytics Case study 1: Seed Set Expansion Case study 2: Hybrid Betweenness Centrality Future architectures Conclusions David A. Bader 30
31 Streaming Seed Set Expansion [Joint research with Anita Zakrzewska] Given a set of seed vertices of interest, seed set expansion finds the best subgraph or set of communities containing the seeds The communities found can be used to identify and track groups interacting entities The results also aid visualization or performing more computationally expensive analytics In a dynamic graph, the optimal communities may change as the graph evolves. Incremental updates can be faster than recomputing from scratch We can track changes over time and detect when interesting changes occur, such as the expanded communities merging or splitting Seed vertices shown in red Expanded communities may change David A. Bader 31
32 Streaming Seed Set Expansion To track seed vertices of interest over time, we run a seed set expansion from each seed and maintain the communities over time The overlap of expanded sets is used to determine when the communities merge together or split apart. This can alert us to interesting changes and events Each individual expansion is incrementally updated as the graph changes New information may cause a community to expand or reduce in size. David A. Bader 32
33 Streaming Seed Set Expansion Since seed set expansion retrieves the community of a seed vertex, the order of member selection is important. Simply re- testing vertices for membership may not detect the change in the community. Greedy seed set expansion results in a monotonically increasing sequence of fitness values as vertices are iteratively added to the community. We perform streaming greedy seed set expansion by updating the fitness sequence and detecting drops in value. After a drop in fitness, the community is be selectively pruned and the fitness scores remain increasing. Initial sequence of fitness scores After a change in the graph After updating the community by pruning David A. Bader 33
34 Streaming Seed Set Expansion The quality of communities produced with our incremental updating approach, compared to recomputing, is measured by precision and recall. Our method maintains high precision and recall as the graph changes. The red dotted line gives a baseline, showing that the communities do change significantly over time. Our incremental updating provides a speedup of two orders of magnitude over standard complete recomputation, depending on community size. David A. Bader 34
35 Outline Overview of Georgia Tech STINGER: Streaming Analytics Case study 1: Seed Set Expansion Case study 2: Hybrid Betweenness Centrality Future architectures Conclusions David A. Bader 35
36 Streaming Graph Analysis [Joint work with Adam McLaughlin] A. McLaughlin, J. Riedy, and D.A. Bader, Optimizing Energy Consumption and Parallel Performance for Betweenness Centrality using GPUs, The 18th Annual IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 9-11, Rising Stars Session A. McLaughlin and D.A. Bader, Scalable and High Performance Betweenness Centrality on the GPU, The 26th IEEE and ACM Supercomputing Conference (SC14), New Orleans, LA, November 16-21, Best Student Paper Finalist. Challenges: Parallel/GPU programs are notoriously difficult to write Existing algorithms are inefficient in terms of time, space, energy Betweenness centrality is computationally expensive David A. Bader 36
37 Centrality in Massive Social Network Analysis Centrality metrics: Quantitative measures to capture the importance of person in a social network Betweenness is a global index related to shortest paths that traverse through the person Can be used for community detection as well Identifying central nodes in large complex networks is the key metric in a number of applications: Biological networks, protein-protein interactions Sexual networks and AIDS Identifying key actors in terrorist networks Organizational behavior Supply chain management Transportation networks David A. Bader 37
38 Contributions A work-efficient algorithm for computing Betweenness Centrality on the GPU Works especially well for high-diameter graphs On-line hybrid approaches that coordinate threads based on graph structure An average speedup of 2.71x over the best existing methods A distributed implementation that scales linearly to up to 192 GPUs Results that are performance portable across the gamut of network structures Open Source code: David A. Bader 38
39 Betweenness Centrality (BC) Key metric in social network analysis [Freeman 77, Goh 02, Newman 03, Brandes 03] BC v s v t V st st v : Number of shortest paths between vertices s and t st (v): Number of shortest paths between vertices s and t passing through v st Exact BC is compute-intensive David A. Bader 39
40 Prior GPU Implementations Vertex and Edge Parallelism [Jia et al. (2011)] Same coarse-grained strategy Edge-parallel approach better utilizes the GPU GPU-FAN [Shi and Zhang (2011)] Reported 11-19% speedup over Jia et al. Results were limited in scope Devote entire GPU to fine-grained parallelism Both use predecessor arrays Our approach: eliminate this array Both use graph traversals Our approach: trade-off memory bandwidth and work David A. Bader 40
41 Coarse-grained Parallelization Strategy David A. Bader 41
42 Fine-grained Parallelization Strategy Vertex-parallel downward traversal Threads are assigned to each vertex Only a subset is active Variable number of edges to traverse per thread David A. Bader 42
43 Fine-grained Parallelization Strategy Vertex-parallel downward traversal Threads are assigned to each vertex Only a subset is active Variable number of edges to traverse per thread David A. Bader 43
44 Fine-grained Parallelization Strategy Vertex-parallel downward traversal Threads are assigned to each vertex Only a subset is active Variable number of edges to traverse per thread David A. Bader 44
45 Fine-grained Parallelization Strategy Vertex-parallel downward traversal Threads are assigned to each vertex Only a subset is active Variable number of edges to traverse per thread David A. Bader 45
46 Fine-grained Parallelization Strategy Vertex-parallel downward traversal Threads are assigned to each vertex Only a subset is active Variable number of edges to traverse per thread David A. Bader 46
47 Fine-grained Parallelization Strategy Edge-parallel downward traversal Threads are assigned to each edge Only a subset is active Balanced amount of work per thread David A. Bader 47
48 Fine-grained Parallelization Strategy Edge-parallel downward traversal Threads are assigned to each edge Only a subset is active Balanced amount of work per thread David A. Bader 48
49 Fine-grained Parallelization Strategy Edge-parallel downward traversal Threads are assigned to each edge Only a subset is active Balanced amount of work per thread David A. Bader 49
50 Fine-grained Parallelization Strategy Edge-parallel downward traversal Threads are assigned to each edge Only a subset is active Balanced amount of work per thread David A. Bader 50
51 Fine-grained Parallelization Strategy Edge-parallel downward traversal Threads are assigned to each edge Only a subset is active Balanced amount of work per thread David A. Bader 51
52 Fine-grained Parallelization Strategy Work-efficient downward traversal Threads are assigned vertices in the frontier Use an explicit queue Variable number of edges to traverse per thread David A. Bader 52
53 Fine-grained Parallelization Strategy Work-efficient downward traversal Threads are assigned vertices in the frontier Use an explicit queue Variable number of edges to traverse per thread David A. Bader 53
54 Fine-grained Parallelization Strategy Work-efficient downward traversal Threads are assigned vertices in the frontier Use an explicit queue Variable number of edges to traverse per thread David A. Bader 54
55 Fine-grained Parallelization Strategy Work-efficient downward traversal Threads are assigned vertices in the frontier Use an explicit queue Variable number of edges to traverse per thread David A. Bader 55
56 Fine-grained Parallelization Strategy Work-efficient downward traversal Threads are assigned vertices in the frontier Use an explicit queue Variable number of edges to traverse per thread David A. Bader 56
57 Motivation for Hybrid Methods No one method of parallelization works best High diameter: Only do useful work Low diameter: Leverage memory bandwidth David A. Bader 57
58 Experimental Setup Single-node CPU (4 Cores) Intel Core i7-2600k 3.4 GHz, 8MB Cache GPU NVIDIA GeForce GTX Titan 14 SMs, 837 MHz, 6 GB GDDR5 Compute Capability 3.5 David A. Bader Multi-node (KIDS) CPUs (2 x 4 Cores) Intel Xeon X GHz, 8 MB Cache GPUs (3) NVIDIA Tesla M SMs, 1.3 GHz, 6 GB GDDR5 Compute Capability 2.0 Infiniband QDR Network All times are reported in seconds 58
59 Benchmark Data Sets Name Vertices Edges Diam. Significance af_shell9 504,855 8,542, Sheet Metal Forming caidarouterlevel 192, , Internet Router Level cnr ,527 2,738, Web crawl com-amazon 334, , Product co-purchasing delaunay_n20 1,048,576 3,145, Random Triangulation kron_g500-logn20 524,288 21,780,787 6 Kronecker Graph loc-gowalla 196,591 1,900, Geosocial luxembourg.osm 114, ,666 1,336 Road Network rgg_n_2_20 1,048,576 6,891, Random Geometric smallworld 100, ,998 9 Logarithmic Diameter David A. Bader 59
60 Scaling Results (Delaunay) Sparse meshes Speedup grows with graph scale David A. Bader 60
61 Scaling Results (Delaunay) Sparse meshes Speedup grows with graph scale When edge-parallel is best it s best by a matter of ms David A. Bader 61
62 Scaling Results (Delaunay) Sparse meshes Speedup grows with graph scale When edge-parallel is best it s best by a matter of ms When sampling is best it s by a matter of days David A. Bader 62
63 Multi-GPU Results Linear speedups when graphs are sufficiently large 10+ GTEPS for 192 GPUs Scaling isn t unique to graph structure Abundant coarsegrained parallelism David A. Bader 63
64 New Results for Streaming Graphs on GPUs Work-efficient approach obtains up to 13x speedup for high-diameter graphs Tradeoff between work-efficiency and DRAM utilization maximizes performance Our algorithms easily scale to many GPUs Linear scaling on up to 192 GPUs (10+ GTEPS) Scalability independent of network structure Dynamic updates are 6.9x faster than static recomputations and consume 83% less energy David A. Bader 64
65 Outline Overview of Georgia Tech STINGER: Streaming Analytics Case study 1: Seed Set Expansion Case study 2: Hybrid Betweenness Centrality Future architectures Conclusions David A. Bader 65
66 Graph500 Benchmark, Defining a new set of benchmarks to guide the design of hardware architectures and software systems intended to support such applications and to help procurements. Graph algorithms are a core part of many analytics workloads. Executive Committee: D.A. Bader, R. Murphy, M. Snir, A. Lumsdaine Five Business Area Data Sets: Cybersecurity 15 Billion Log Entires/Day (for large enterprises) Full Data Scan with End-to-End Join Required Medical Informatics 50M patient records, records/patient, billions of individuals Entity Resolution Important Social Networks Example, Facebook, Twitter Nearly Unbounded Dataset Size Data Enrichment Easily PB of data Example: Maritime Domain Awareness Hundreds of Millions of Transponders Tens of Thousands of Cargo Ships Tens of Millions of Pieces of Bulk Cargo May involve additional data (images, etc.) Symbolic Networks Example, the Human Brain 25B Neurons 7,000+ Connections/Neuron David A. Bader 66
67 Heterogeneity in Big Data systems: High Performance Data Analytics Analytic platforms will combine: Cloud (Hadoop/map-reduce) Stream processing Large shared-memory systems Massive multithreaded architectures Multicore and accelerators The challenge: developing methodologies for employing these complementary systems in an enterprise-class analytics framework for solving grand challenges in massive data analysis for discovery, real-time analytics, and forensics. Steve Mills, SVP of IBM Software (left), and Dr. John Kelly, SVP of IBM Research, view Stream Computing technology David A. Bader 67
68 Future Architectures Highly multithreaded High bandwidth (network and memory) Complex but flexible memory hierarchy Heterogeneous design in core capability and ISA David A. Bader 68
69 Disruptive Platform Changes In next 3 5 years, memory is going to change. 3D stacked memory (IBM, NVIDIA) Hybrid memory cube (HMC Cons., Micron, Intel) Programming logic layer on-chip Possibly non-volatile Order of magnitude higher bandwidth Order of magnitude lower energy cost Interconnects are changing. Processor memory accelerator (NVLink, Phi) Data-center networks finally may change, not just ngbe David A. Bader 69
70 Revolutionary Changes in Memory Technology In the next 3-5 years, memory technologies will undergo a dramatic change 3-D stacked memory Programming logic layer on chip E.g. Hybrid Memory Cube (HMC) consortium Energy is a major constraint in both embedded (smartphone) and supercomputing systems, and 75% of the energy is spent moving data Logic layer allows operations to be performed on-memory chips without needing the round-trip to move the data from memory to processor This is a huge change in the relative cost of operations! David A. Bader 70
71 Outline Overview of Georgia Tech STINGER: Streaming Analytics Case study 1: Seed Set Expansion Case study 2: Hybrid Betweenness Centrality Future architectures Conclusions David A. Bader 71
72 Conclusions Massive-Scale Streaming Analytics will require new High-performance computing platforms Streaming algorithms Energy-efficient implementations and are promising to solve real-world challenges! Mapping applications to high performance architectures may yield 6 or more orders of magnitude performance improvement David A. Bader 72
73 Acknowledgments Jason Riedy, Research Scientist, (Georgia Tech) Graduate Students (Georgia Tech): James Fairbanks Rob McColl Eisha Nathan Anita Zakrzewska Bader Alumni: Dr. David Ediger (GTRI) Dr. Oded Green (ArrayFire) Dr. Seunghwa Kang (Pacific Northwest National Lab) Prof. Kamesh Madduri (Penn State) Dr. Guojing Cong (IBM TJ Watson Research Center) David A. Bader 73
74 Bader, Related Recent Publications ( ) D.A. Bader, G. Cong, and J. Feo, On the Architectural Requirements for Efficient Execution of Graph Algorithms, The 34th International Conference on Parallel Processing (ICPP 2005), pp , Georg Sverdrups House, University of Oslo, Norway, June 14-17, D.A. Bader and K. Madduri, Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors, The 12th International Conference on High Performance Computing (HiPC 2005), D.A. Bader et al., (eds.), Springer-Verlag LNCS 3769, , Goa, India, December D.A. Bader and K. Madduri, Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2, The 35th International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, D.A. Bader and K. Madduri, Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks, The 35th International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, Parallel Shortest Path Algorithms for Solving Large-Scale Instances, 9th DIMACS Implementation Challenge -- The Shortest Path Problem, DIMACS Center, Rutgers University, Piscataway, NJ, November 13-14, K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances, Workshop on Algorithm Engineering and Experiments (ALENEX), New Orleans, LA, January 6, J.R. Crobak, J.W. Berry, K. Madduri, and D.A. Bader, Advanced Shortest Path Algorithms on a Massively-Multithreaded Architecture, First Workshop on Multithreaded Architectures and Applications (MTAAP), Long Beach, CA, March 30, D.A. Bader and K. Madduri, High-Performance Combinatorial Techniques for Analyzing Massive Dynamic Interaction Networks, DIMACS Workshop on Computational Methods for Dynamic Interaction Networks, DIMACS Center, Rutgers University, Piscataway, NJ, September 24-25, D.A. Bader, S. Kintali, K. Madduri, and M. Mihail, Approximating Betewenness Centrality, The 5th Workshop on Algorithms and Models for the Web-Graph (WAW2007), San Diego, CA, December 11-12, David A. Bader, Kamesh Madduri, Guojing Cong, and John Feo, Design of Multithreaded Algorithms for Combinatorial Problems, in S. Rajasekaran and J. Reif, editors, Handbook of Parallel Computing: Models, Algorithms, and Applications, CRC Press, Chapter 31, Kamesh Madduri, David A. Bader, Jonathan W. Berry, Joseph R. Crobak, and Bruce A. Hendrickson, Multithreaded Algorithms for Processing Massive Graphs, in D.A. Bader, editor, Petascale Computing: Algorithms and Applications, Chapman & Hall / CRC Press, Chapter 12, D.A. Bader and K. Madduri, SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks, 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, April 14-18, S. Kang, D.A. Bader, An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Raleigh, NC, February Karl Jiang, David Ediger, and David A. Bader. Generalizing k-betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation. The 38th International Conference on Parallel Processing (ICPP), Vienna, Austria, September Kamesh Madduri, David Ediger, Karl Jiang, David A. Bader, Daniel Chavarría-Miranda. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets. 3 rd Workshop on Multithreaded Architectures and Applications (MTAAP), Rome, Italy, May David A. Bader, et al. STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation David A. Bader 74
75 Bader, Related Recent Publications ( ) David Ediger, Karl Jiang, E. Jason Riedy, and David A. Bader. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients, Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April Seunghwa Kang, David A. Bader. Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System:, Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April David Ediger, Karl Jiang, Jason Riedy, David A. Bader, Courtney Corley, Rob Farber and William N. Reynolds. Massive Social Network Analysis: Mining Twitter for Social Good, The 39th International Conference on Parallel Processing (ICPP 2010), San Diego, CA, September Virat Agarwal, Fabrizio Petrini, Davide Pasetto and David A. Bader. Scalable Graph Exploration on Multicore Processors, The 22nd IEEE and ACM Supercomputing Conference (SC10), New Orleans, LA, November Z. Du, Z. Yin, W. Liu, and D.A. Bader, On Accelerating Iterative Algorithms with CUDA: A Case Study on Conditional Random Fields Training Algorithm for Biological Sequence Alignment, IEEE International Conference on Bioinformatics & Biomedicine, Workshop on Data-Mining of Next Generation Sequencing Data (NGS2010), Hong Kong, December 20, D. Ediger, J. Riedy, H. Meyerhenke, and D.A. Bader, Tracking Structure of Streaming Social Networks, 5th Workshop on Multithreaded Architectures and Applications (MTAAP), Anchorage, AK, May 20, D. Mizell, D.A. Bader, E.L. Goodman, and D.J. Haglin, Semantic Databases and Supercomputers, 2011 Semantic Technology Conference (SemTech), San Francisco, CA, June 5-9, P. Pande and D.A. Bader, Computing Betweenness Centrality for Small World Networks on a GPU, The 15th Annual High Performance Embedded Computing Workshop (HPEC), Lexington, MA, September 21-22, David A. Bader, Christine Heitsch, and Kamesh Madduri, Large-Scale Network Analysis, in J. Kepner and J. Gilbert, editor, Graph Algorithms in the Language of Linear Algebra, SIAM Press, Chapter 12, pages , Jeremy Kepner, David A. Bader, Robert Bond, Nadya Bliss, Christos Faloutsos, Bruce Hendrickson, John Gilbert, and Eric Robinson, Fundamental Questions in the Analysis of Large Graphs, in J. Kepner and J. Gilbert, editor, Graph Algorithms in the Language of Linear Algebra, SIAM Press, Chapter 16, pages , David A. Bader 75
76 Bader, Related Recent Publications (2012) E.J. Riedy, H. Meyerhenke, D. Ediger, and D.A. Bader, Parallel Community Detection for Massive Graphs, The 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Torun, Poland, September 11-14, Lecture Notes in Computer Science, 7203: , E.J. Riedy, D. Ediger, D.A. Bader, and H. Meyerhenke, Parallel Community Detection for Massive Graphs, 10th DIMACS Implementation Challenge -- Graph Partitioning and Graph Clustering, Atlanta, GA, February 13-14, E.J. Riedy, H. Meyerhenke, D.A. Bader, D. Ediger, and T. Mattson, Analysis of Streaming Social Networks and Graphs on Multicore Architectures, The 37th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March 25-30, J. Riedy, H. Meyerhenke, and D.A. Bader, Scalable Multi-threaded Community Detection in Social Networks, 6th Workshop on Multithreaded Architectures and Applications (MTAAP), Shanghai, China, May 25, H. Meyerhenke, E.J. Riedy, and D.A. Bader, Parallel Community Detection in Streaming Graphs, Minisymposium on Parallel Analysis of Massive Social Networks, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, D. Ediger, E.J. Riedy, H. Meyerhenke, and D.A. Bader, Analyzing Massive Networks with GraphCT, Poster Session, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, R.C. McColl, D. Ediger, and D.A. Bader, Many-Core Memory Hierarchies and Parallel Graph Analysis, Poster Session, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, E.J. Riedy, D. Ediger, H. Meyerhenke, and D.A. Bader, STING: Software for Analysis of Spatio-Temporal Interaction Networks and Graphs, Poster Session, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, Y. Chai, Z. Du, D.A. Bader, and X. Qin, "Efficient Data Migration to Conserve Energy in Streaming Media Storage Systems," IEEE Transactions on Parallel & Distributed Systems, M. S. Swenson, J. Anderson, A. Ash, P. Gaurav, Z. Sükösd, D.A. Bader, S.C. Harvey and C.E Heitsch, "GTfold: Enabling parallel RNA secondary structure prediction on multi-core desktops," BMC Research Notes, 5:341, D. Ediger, K. Jiang, E.J. Riedy, and D.A. Bader, "GraphCT: Multithreaded Algorithms for Massive Graph Analysis," IEEE Transactions on Parallel & Distributed Systems, D.A. Bader and K. Madduri, "Computational Challenges in Emerging Combinatorial Scientific Computing Applications," in O. Schenk, editor, Combinatorial Scientific Computing, Chapman & Hall / CRC Press, Chapter 17, pages , O. Green, R. McColl, and D.A. Bader, "GPU Merge Path -- A GPU Merging Algorithm," 26th ACM International Conference on Supercomputing (ICS), San Servolo Island, Venice, Italy, June 25-29, O. Green, R. McColl, and D.A. Bader, "A Fast Algorithm for Streaming Betweenness Centrality," 4th ASE/IEEE International Conference on Social Computing (SocialCom), Amsterdam, The Netherlands, September 3-5, D. Ediger, R. McColl, J. Riedy, and D.A. Bader, "STINGER: High Performance Data Structure for Streaming Graphs," The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 20-22, Best Paper Award. J. Marandola, S. Louise, L. Cudennec, J.-T. Acquaviva and D.A. Bader, "Enhancing Cache Coherent Architecture with Access Patterns for Embedded Manycore Systems," 14th IEEE International Symposium on System-on-Chip (SoC), Tampere, Finland, October 11-12, L.M. Munguía, E. Ayguade, and D.A. Bader, "Task-based Parallel Breadth-First Search in Heterogeneous Environments," The 19th Annual IEEE International Conference on High Performance Computing (HiPC), Pune, India, December 18-21, David A. Bader 76
77 Bader, Related Recent Publications (2013) X. Liu, P. Pande, H. Meyerhenke, and D.A. Bader, "PASQUAL: Parallel Techniques for Next Generation Genome Sequence Assembly," IEEE Transactions on Parallel & Distributed Systems, 24(5): , David A. Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner (eds.), Graph Partitioning and Graph Clustering, American Mathematical Society, E. Jason Riedy, Henning Meyerhenke, David Ediger and David A. Bader, "Parallel Community Detection for Massive Graphs," in David A. Bader, Henning Meyerhenke, Peter Sanders, and Dorothea Wagner (eds.), Graph Partitioning and Graph Clustering, American Mathematical Society, Chapter 14, pages , S. Kang, D.A. Bader, and R. Vuduc, "Energy-Efficient Scheduling for Best-Effort Interactive Services to Achieve High Response Quality," 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Boston, MA, May 20-24, J. Riedy and D.A. Bader, "Multithreaded Community Monitoring for Massive Streaming Graph Data," 7th Workshop on Multithreaded Architectures and Applications (MTAAP), Boston, MA, May 24, D. Ediger and D.A. Bader, "Investigating Graph Algorithms in the BSP Model on the Cray XMT," 7th Workshop on Multithreaded Architectures and Applications (MTAAP), Boston, MA, May 24, O. Green and D.A. Bader, "Faster Betweenness Centrality Based on Data Structure Experimentation," International Conference on Computational Science (ICCS), Barcelona, Spain, June 5-7, Z. Yin, J. Tang, S. Schaeffer, and D.A. Bader, "Streaming Breakpoint Graph Analytics for Accelerating and Parallelizing the Computation of DCJ Median of Three Genomes," International Conference on Computational Science (ICCS), Barcelona, Spain, June 5-7, T. Senator, D.A. Bader, et al., "Detecting Insider Threats in a Real Corporate Database of Computer Usage Activities," 19th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Chicago, IL, August 11-14, J. Fairbanks, D. Ediger, R. McColl, D.A. Bader and E. Gilbert, "A Statistical Framework for Streaming Graph Analysis," IEEE/ACM International Conference on Advances in Social Networks Analysis and Modeling (ASONAM), Niagara Falls, Canada, August 25-28, A. Zakrzewska and D.A. Bader, "Measuring the Sensitivity of Graph Metrics to Missing Data," 10th International Conference on Parallel Processing and Applied Mathematics (PPAM), Warsaw, Poland, September 8-11, O. Green and D.A. Bader, "A Fast Algorithm for Streaming Betweenness Centrality," 5th ASE/IEEE International Conference on Social Computing (SocialCom), Washington, DC, September 8-14, R. McColl, O. Green, and D.A. Bader, "A New Parallel Algorithm for Connected Components in Dynamic Graphs," The 20th Annual IEEE International Conference on High Performance Computing (HiPC), Bangalore, India, December 18-21, David A. Bader 77
78 Bader, Related Recent Publications (2014) R. McColl, D. Ediger, J. Poovey, D. Campbell, and D.A. Bader, "A Performance Evaluation of Open Source Graph Databases," The 1st Workshop on Parallel Programming for Analytics Applications (PPAA 2014) held in conjunction with the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2014), Orlando, Florida, February 16, O. Green, L.M. Munguia, and D.A. Bader, "Load Balanced Clustering Coefficients," The 1st Workshop on Parallel Programming for Analytics Applications (PPAA 2014) held in conjunction with the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2014), Orlando, Florida, February 16, A. McLaughlin and D.A. Bader, "Revisiting Edge and Node Parallelism for Dynamic GPU Graph Analytics," 8th Workshop on Multithreaded Architectures and Applications (MTAAP), held in conjuntion with The IEEE International Parallel and Distributed Processing Symposium (IPDPS 2014), Phoenix, AZ, May 23, Z. Yin, J. Tang, S. Schaeffer, D.A. Bader, "A Lin-Kernighan Heuristic for the DCJ Median Problem of Genomes with Unequal Contents," 20th International Computing and Combinatorics Conference (COCOON), Atlanta, GA, August 4-6, Y. You, D.A. Bader and M.M. Dehnavi, "Designing an Adaptive Cross-Architecture Combination for Graph Traversal," The 43rd International Conference on Parallel Processing (ICPP 2014), Minneapolis, MN, September 9-12, A. McLaughlin, J. Riedy, and D.A. Bader, "Optimizing Energy Consumption and Parallel Performance for Betweenness Centrality using GPUs," The 18th Annual IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September 9-11, A. McLaughlin and D.A. Bader, "Scalable and High Performance Betweenness Centrality on the GPU," The 26th IEEE and ACM Supercomputing Conference (SC14), New Orleans, LA, November 16-21, Best Student Paper Finalist. David A. Bader 78
79 Acknowledgment of Support David A. Bader 79
Applications and challenges in large-scale graph analysis. David A. Bader, Jason Riedy, Henning Meyerhenke
Applications and challenges in large-scale graph analysis David A. Bader, Jason Riedy, Henning Meyerhenke Exascale Streaming Data Analytics: Real-world challenges All involve analyzing massive streaming
STINGER: High Performance Data Structure for Streaming Graphs
STINGER: High Performance Data Structure for Streaming Graphs David Ediger Rob McColl Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA, USA Abstract The current research focus on
graph analysis beyond linear algebra
graph analysis beyond linear algebra E. Jason Riedy DMML, 24 October 2015 HPC Lab, School of Computational Science and Engineering Georgia Institute of Technology motivation and applications (insert prefix
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
GPU System Architecture. Alan Gray EPCC The University of Edinburgh
GPU System Architecture EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? GPU-CPU comparison Architectural reasons for GPU performance advantages GPU accelerated systems
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK
HETEROGENEOUS HPC, ARCHITECTURE OPTIMIZATION, AND NVLINK Steve Oberlin CTO, Accelerated Computing US to Build Two Flagship Supercomputers SUMMIT SIERRA Partnership for Science 100-300 PFLOPS Peak Performance
Data Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
How To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
HPC Programming Framework Research Team
HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING
A SURVEY ON MAPREDUCE IN CLOUD COMPUTING Dr.M.Newlin Rajkumar 1, S.Balachandar 2, Dr.V.Venkatesakumar 3, T.Mahadevan 4 1 Asst. Prof, Dept. of CSE,Anna University Regional Centre, Coimbatore, [email protected]
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
NVIDIA CUDA Software and GPU Parallel Computing Architecture. David B. Kirk, Chief Scientist
NVIDIA CUDA Software and GPU Parallel Computing Architecture David B. Kirk, Chief Scientist Outline Applications of GPU Computing CUDA Programming Model Overview Programming in CUDA The Basics How to Get
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
Enabling Multi-pipeline Data Transfer in HDFS for Big Data Applications
Enabling Multi-pipeline Data Transfer in HDFS for Big Data Applications Liqiang (Eric) Wang, Hong Zhang University of Wyoming Hai Huang IBM T.J. Watson Research Center Background Hadoop: Apache Hadoop
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC
OpenPOWER Outlook AXEL KOEHLER SR. SOLUTION ARCHITECT HPC Driving industry innovation The goal of the OpenPOWER Foundation is to create an open ecosystem, using the POWER Architecture to share expertise,
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries
Performance Evaluations of Graph Database using CUDA and OpenMP Compatible Libraries Shin Morishima 1 and Hiroki Matsutani 1,2,3 1Keio University, 3 14 1 Hiyoshi, Kohoku ku, Yokohama, Japan 2National Institute
MapReduce on GPUs. Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu
1 MapReduce on GPUs Amit Sabne, Ahmad Mujahid Mohammed Razip, Kun Xu 2 MapReduce MAP Shuffle Reduce 3 Hadoop Open-source MapReduce framework from Apache, written in Java Used by Yahoo!, Facebook, Ebay,
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
InfiniteGraph: The Distributed Graph Database
A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age
Scalable and High Performance Computing for Big Data Analytics in Understanding the Human Dynamics in the Mobile Age Xuan Shi GRA: Bowei Xue University of Arkansas Spatiotemporal Modeling of Human Dynamics
Accelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU
Benchmark Hadoop and Mars: MapReduce on cluster versus on GPU Heshan Li, Shaopeng Wang The Johns Hopkins University 3400 N. Charles Street Baltimore, Maryland 21218 {heshanli, shaopeng}@cs.jhu.edu 1 Overview
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study
Six Days in the Network Security Trenches at SC14 A Cray Graph Analytics Case Study WP-NetworkSecurity-0315 www.cray.com Table of Contents Introduction... 3 Analytics Mission and Source Data... 3 Analytics
How To Speed Up A Flash Flash Storage System With The Hyperq Memory Router
HyperQ Hybrid Flash Storage Made Easy White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com [email protected] [email protected]
Parallel Firewalls on General-Purpose Graphics Processing Units
Parallel Firewalls on General-Purpose Graphics Processing Units Manoj Singh Gaur and Vijay Laxmi Kamal Chandra Reddy, Ankit Tharwani, Ch.Vamshi Krishna, Lakshminarayanan.V Department of Computer Engineering
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS
GeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
Using In-Memory Computing to Simplify Big Data Analytics
SCALEOUT SOFTWARE Using In-Memory Computing to Simplify Big Data Analytics by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T he big data revolution is upon us, fed
Next Generation GPU Architecture Code-named Fermi
Next Generation GPU Architecture Code-named Fermi The Soul of a Supercomputer in the Body of a GPU Why is NVIDIA at Super Computing? Graphics is a throughput problem paint every pixel within frame time
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
Overview of HPC Resources at Vanderbilt
Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources
Understanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
BSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
Infrastructure Matters: POWER8 vs. Xeon x86
Advisory Infrastructure Matters: POWER8 vs. Xeon x86 Executive Summary This report compares IBM s new POWER8-based scale-out Power System to Intel E5 v2 x86- based scale-out systems. A follow-on report
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION
DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION A DIABLO WHITE PAPER AUGUST 2014 Ricky Trigalo Director of Business Development Virtualization, Diablo Technologies
FAWN - a Fast Array of Wimpy Nodes
University of Warsaw January 12, 2011 Outline Introduction 1 Introduction 2 3 4 5 Key issues Introduction Growing CPU vs. I/O gap Contemporary systems must serve millions of users Electricity consumed
Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
Parallel Programming Survey
Christian Terboven 02.09.2014 / Aachen, Germany Stand: 26.08.2014 Version 2.3 IT Center der RWTH Aachen University Agenda Overview: Processor Microarchitecture Shared-Memory
Trends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing
Efficient Parallel Execution of Sequence Similarity Analysis Via Dynamic Load Balancing James D. Jackson Philip J. Hatcher Department of Computer Science Kingsbury Hall University of New Hampshire Durham,
Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)
WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...
Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis
Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis Kamesh Madduri Computational Research Division Lawrence Berkeley National Laboratory Berkeley, USA
MATTEO RIONDATO Curriculum vitae
MATTEO RIONDATO Curriculum vitae 100 Avenue of the Americas, 16 th Fl. New York, NY 10013, USA +1 646 292 6641 [email protected] http://matteo.rionda.to EDUCATION Ph.D. Computer Science, Brown University,
9700 South Cass Avenue, Lemont, IL 60439 URL: www.mcs.anl.gov/ fulin
Fu Lin Contact information Education Work experience Research interests Mathematics and Computer Science Division Phone: (630) 252-0973 Argonne National Laboratory E-mail: [email protected] 9700 South
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU
Efficient Parallel Graph Exploration on Multi-Core CPU and GPU Pervasive Parallelism Laboratory Stanford University Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun Graph and its Applications Graph Fundamental
Improving Grid Processing Efficiency through Compute-Data Confluence
Solution Brief GemFire* Symphony* Intel Xeon processor Improving Grid Processing Efficiency through Compute-Data Confluence A benchmark report featuring GemStone Systems, Intel Corporation and Platform
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Overlapping Data Transfer With Application Execution on Clusters
Overlapping Data Transfer With Application Execution on Clusters Karen L. Reid and Michael Stumm [email protected] [email protected] Department of Computer Science Department of Electrical and Computer
Green-Marl: A DSL for Easy and Efficient Graph Analysis
Green-Marl: A DSL for Easy and Efficient Graph Analysis Sungpack Hong*, Hassan Chafi* +, Eric Sedlar +, and Kunle Olukotun* *Pervasive Parallelism Lab, Stanford University + Oracle Labs Graph Analysis
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank
Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»
Graphics Cards and Graphics Processing Units. Ben Johnstone Russ Martin November 15, 2011
Graphics Cards and Graphics Processing Units Ben Johnstone Russ Martin November 15, 2011 Contents Graphics Processing Units (GPUs) Graphics Pipeline Architectures 8800-GTX200 Fermi Cayman Performance Analysis
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes
Parallel Programming at the Exascale Era: A Case Study on Parallelizing Matrix Assembly For Unstructured Meshes Eric Petit, Loïc Thebault, Quang V. Dinh May 2014 EXA2CT Consortium 2 WPs Organization Proto-Applications
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
A Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing
Accelerating Simulation & Analysis with Hybrid GPU Parallelization and Cloud Computing Innovation Intelligence Devin Jensen August 2012 Altair Knows HPC Altair is the only company that: makes HPC tools
