Applications and challenges in large-scale graph analysis. David A. Bader, Jason Riedy, Henning Meyerhenke
|
|
|
- Dwayne Thomas
- 10 years ago
- Views:
Transcription
1 Applications and challenges in large-scale graph analysis David A. Bader, Jason Riedy, Henning Meyerhenke
2 Exascale Streaming Data Analytics: Real-world challenges All involve analyzing massive streaming complex networks: Health care disease spread, detection and prevention of epidemics/pandemics (e.g. SARS, Avian flu, H1N1 swine flu) Massive social networks understanding communities, intentions, population dynamics, pandemic spread, transportation and evacuation Intelligence business analytics, anomaly detection, security, knowledge discovery from massive data sets Systems Biology understanding complex life systems, drug design, microbial research, unravel the mysteries of the HIV virus; understand life, disease, Electric Power Grid communication, transportation, energy, water, food supply Modeling and Simulation Perform fullscale economic-social-political simulations Dec-04 Million Users Exponential growth: More than 900 million active users Mar-05 Jun-05 Sep-05 Dec-05 Mar-06 Jun-06 Sep-06 Dec-06 Mar-07 Ex: discovered minimal changes in O(billions)-size complex network that could hide or reveal top influencers in the community Jun-07 Sep-07 Dec-07 Mar-08 Jun-08 Sep-08 Dec-08 Mar-09 Jun-09 Sep-09 Dec-09 Sample queries: Allegiance switching: identify entities that switch communities. Community structure: identify the genesis and dissipation of communities Phase change: identify significant change in the network structure REQUIRES PREDICTING / INFLUENCE CHANGE IN REAL-TIME AT SCALE David A. Bader 2
3 Current Example Data Rates Financial: NYSE processes 1.5TB daily, maintains 8PB Social: Facebook adds >100k users, 55M status updates, 80M photos daily; ~1B active users with an average of 130 friend connections each. Foursquare reports 1.2M location check-ins per week Scientific: MEDLINE adds from 1 to 140 publications a day Shared features: All data is rich, irregularly connected to other data. All is a mix of good and bad data... And much real data may be missing or inconsistent. David A. Bader 3
4 Ubiquitous High Performance Computing (UHPC) Goal: develop highly parallel, security enabled, power efficient processing systems, supporting ease of programming, with resilient execution through all failure modes and intrusion attacks Architectural Drivers: Energy Efficient Security and Dependability Programmability Program Objectives: One PFLOPS, single cabinet including self-contained cooling 50 GFLOPS/W (equivalent to 20 pj/flop) Total cabinet power budget 57KW, includes processing resources, storage and cooling Security embedded at all system levels Parallel, efficient execution models Highly programmable parallel systems Scalable systems from terascale to petascale David A. Bader (CSE) Echelon Leadership Team NVIDIA-Led Team Receives $25 Million Contract From DARPA to Develop High-Performance GPU Computing Systems -MarketWatch Echelon: Extreme-scale Compute Hierarchies with Efficient Locality-Optimized Nodes
5 GRATEFUL: Graph Analysis Tackling power Efficiency, Uncertainty, and Locality Objective: Research and develop new algorithms and software for crucial graph analysis problems in cybersecurity, intelligence integration, and network analysis. In-the-field, embedded processing systems have limited computational capabilities due to severe power constraints. DARPA s Power Efficiency Revolution For Embedded Computing Technologies (PERFECT) program will address this need by increasing the computational power efficiency of these embedded systems. The project seeks a power efficiency of 75 GFLOPS/watt. GRATEFUL will extend high-performance graph analysis algorithms to reduce power usage and provide resilience against imperfect data. We will build a model of concurrency requirements for graph algorithms. The project will provide energy-conscious and data error-resilient algorithms on streaming data, including computing and maintaining shortest path trees and graph decompositions into communities. PIs: David A. Bader and Jason Riedy Sponsored under the DARPA PERFECT program, Contract HR
6 Center for Adaptive Supercomputing Software for MultiThreaded Architectures (CASS-MT) Launched July 2008 Pacific-Northwest Lab Georgia Tech, Sandia, WA State, Delaware The newest breed of supercomputers have hardware set up not just for speed, but also to better tackle large networks of seemingly random data. And now, a multi-institutional group of researchers has been awarded over $14 million to develop software for these supercomputers. Applications include anywhere complex webs of information can be found: from internet security and power grid stability to complex biological networks. David A. Bader 6
7 Example: Mining Twitter for Social Good ICPP 2010 Image credit: bioethicsinstitute.org David A. Bader 7
8 Massive Data Analytics: Protecting our Nation US High Voltage Transmission Grid (>150,000 miles of line) Public Health CDC / Nation-scale surveillance of public health Cancer genomics and drug design computed Betweenness Centrality of Human Proteome Betweenness Centrality 1e+0 1e-1 1e-2 1e-3 1e-4 1e-5 Human Genome core protein interactions Degree vs. Betweenness Centrality ENSG Kelchlike protein implicat ed in breast cancer 1e-6 1e Degree David A. Bader 8
9 Network Analysis for Intelligence and Survelliance [Krebs 04] Post 9/11 Terrorist Network Analysis from public domain information Plot masterminds correctly identified from interaction patterns: centrality A global view of entities is often more insightful Detect anomalous activities by exact/approximate graph matching Image Source: Image Source: T. Coffman, S. Greenblatt, S. Marcus, Graph-based technologies for intelligence analysis, CACM, 47 (3, March 2004): pp David A. Bader 9
10 Graphs are pervasive in large-scale data analysis Sources of massive data: petascale simulations, experimental devices, the Internet, scientific applications. New challenges for analysis: data sizes, heterogeneity, uncertainty, data quality. Astrophysics Problem: Outlier detection. Challenges: massive datasets, temporal variations. Graph problems: clustering, matching. Bioinformatics Problem: Identifying drug target proteins. Challenges: Data heterogeneity, quality. Graph problems: centrality, clustering. Social Informatics Problem: Discover emergent communities, model spread of information. Challenges: new analytics routines, uncertainty in data. Graph problems: clustering, shortest paths, flows. Image sources: (1) (2,3) David A. Bader 10
11 Graph Analytics for Social Networks Are there new graph techniques? Do they parallelize? Can the computational systems (algorithms, machines) handle massive networks with millions to billions of individuals? Can the techniques tolerate noisy data, massive data, streaming data, etc. Communities may overlap, exhibit different properties and sizes, and be driven by different models Detect communities (static or emerging) Identify important individuals Detect anomalous behavior Given a community, find a representative member of the community Given a set of individuals, find the best community that includes them David A. Bader 11
12 Graph500 Benchmark, Defining a new set of benchmarks to guide the design of hardware architectures and software systems intended to support such applications and to help procurements. Graph algorithms are a core part of many analytics workloads. Executive Committee: D.A. Bader, R. Murphy, M. Snir, A. Lumsdaine Five Business Area Data Sets: Cybersecurity 15 Billion Log Entires/Day (for large enterprises) Full Data Scan with End-to-End Join Required Medical Informatics 50M patient records, records/patient, billions of individuals Entity Resolution Important Social Networks Example, Facebook, Twitter Nearly Unbounded Dataset Size Data Enrichment Easily PB of data Example: Maritime Domain Awareness Hundreds of Millions of Transponders Tens of Thousands of Cargo Ships Tens of Millions of Pieces of Bulk Cargo May involve additional data (images, etc.) Symbolic Networks Example, the Human Brain 25B Neurons 7,000+ Connections/Neuron David A. Bader 12
13 STING Extensible Representation (STINGER) Enhanced representation developed for dynamic graphs developed in consultation with David A. Bader, Jon Berry, Adam Amos-Binks, Daniel Chavarría-Miranda, Charles Hastings, Kamesh Madduri, and Steven C. Poulos. Design goals: Be useful for the entire large graph community Portable semantics and high-level optimizations across multiple platforms & frameworks (XMT C, MTGL, etc.) Permit good performance: No single structure is optimal for all. Assume globally addressable memory access Support multiple, parallel readers and a single writer Operations: Insert/update & delete both vertices & edges Aging-off: Remove old edges (by timestamp) Serialization to support checkpointing, etc. David A. Bader 13
14 STING Extensible Representation Semi-dense edge list blocks with free space Compactly stores timestamps, types, weights Maps from application IDs to storage IDs Deletion by negating IDs, separate compaction David A. Bader 14
15 STINGER Software Dissemination 15
16 Bader, Related Recent Publications ( ) D.A. Bader, G. Cong, and J. Feo, On the Architectural Requirements for Efficient Execution of Graph Algorithms, The 34th International Conference on Parallel Processing (ICPP 2005), pp , Georg Sverdrups House, University of Oslo, Norway, June 14-17, D.A. Bader and K. Madduri, Design and Implementation of the HPCS Graph Analysis Benchmark on Symmetric Multiprocessors, The 12th International Conference on High Performance Computing (HiPC 2005), D.A. Bader et al., (eds.), Springer-Verlag LNCS 3769, , Goa, India, December D.A. Bader and K. Madduri, Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2, The 35th International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, D.A. Bader and K. Madduri, Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks, The 35th International Conference on Parallel Processing (ICPP 2006), Columbus, OH, August 14-18, K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, Parallel Shortest Path Algorithms for Solving Large-Scale Instances, 9th DIMACS Implementation Challenge -- The Shortest Path Problem, DIMACS Center, Rutgers University, Piscataway, NJ, November 13-14, K. Madduri, D.A. Bader, J.W. Berry, and J.R. Crobak, An Experimental Study of A Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances, Workshop on Algorithm Engineering and Experiments (ALENEX), New Orleans, LA, January 6, J.R. Crobak, J.W. Berry, K. Madduri, and D.A. Bader, Advanced Shortest Path Algorithms on a Massively-Multithreaded Architecture, First Workshop on Multithreaded Architectures and Applications (MTAAP), Long Beach, CA, March 30, D.A. Bader and K. Madduri, High-Performance Combinatorial Techniques for Analyzing Massive Dynamic Interaction Networks, DIMACS Workshop on Computational Methods for Dynamic Interaction Networks, DIMACS Center, Rutgers University, Piscataway, NJ, September 24-25, D.A. Bader, S. Kintali, K. Madduri, and M. Mihail, Approximating Betewenness Centrality, The 5th Workshop on Algorithms and Models for the Web-Graph (WAW2007), San Diego, CA, December 11-12, David A. Bader, Kamesh Madduri, Guojing Cong, and John Feo, Design of Multithreaded Algorithms for Combinatorial Problems, in S. Rajasekaran and J. Reif, editors, Handbook of Parallel Computing: Models, Algorithms, and Applications, CRC Press, Chapter 31, Kamesh Madduri, David A. Bader, Jonathan W. Berry, Joseph R. Crobak, and Bruce A. Hendrickson, Multithreaded Algorithms for Processing Massive Graphs, in D.A. Bader, editor, Petascale Computing: Algorithms and Applications, Chapman & Hall / CRC Press, Chapter 12, D.A. Bader and K. Madduri, SNAP, Small-world Network Analysis and Partitioning: an open-source parallel graph framework for the exploration of large-scale networks, 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), Miami, FL, April 14-18, David A. Bader 16
17 Bader, Related Recent Publications ( ) S. Kang, D.A. Bader, An Efficient Transactional Memory Algorithm for Computing Minimum Spanning Forest of Sparse Graphs, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Raleigh, NC, February Karl Jiang, David Ediger, and David A. Bader. Generalizing k-betweenness Centrality Using Short Paths and a Parallel Multithreaded Implementation. The 38th International Conference on Parallel Processing (ICPP), Vienna, Austria, September Kamesh Madduri, David Ediger, Karl Jiang, David A. Bader, Daniel Chavarría-Miranda. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets. 3 rd Workshop on Multithreaded Architectures and Applications (MTAAP), Rome, Italy, May David A. Bader, et al. STINGER: Spatio-Temporal Interaction Networks and Graphs (STING) Extensible Representation David Ediger, Karl Jiang, E. Jason Riedy, and David A. Bader. Massive Streaming Data Analytics: A Case Study with Clustering Coefficients, Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April Seunghwa Kang, David A. Bader. Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce cluster and a Highly Multithreaded System:, Fourth Workshop in Multithreaded Architectures and Applications (MTAAP), Atlanta, GA, April David Ediger, Karl Jiang, Jason Riedy, David A. Bader, Courtney Corley, Rob Farber and William N. Reynolds. Massive Social Network Analysis: Mining Twitter for Social Good, The 39th International Conference on Parallel Processing (ICPP 2010), San Diego, CA, September Virat Agarwal, Fabrizio Petrini, Davide Pasetto and David A. Bader. Scalable Graph Exploration on Multicore Processors, The 22nd IEEE and ACM Supercomputing Conference (SC10), New Orleans, LA, November Z. Du, Z. Yin, W. Liu, and D.A. Bader, On Accelerating Iterative Algorithms with CUDA: A Case Study on Conditional Random Fields Training Algorithm for Biological Sequence Alignment, IEEE International Conference on Bioinformatics & Biomedicine, Workshop on Data-Mining of Next Generation Sequencing Data (NGS2010), Hong Kong, December 20, David A. Bader 17
18 Bader, Related Recent Publications ( ) D. Ediger, J. Riedy, H. Meyerhenke, and D.A. Bader, Tracking Structure of Streaming Social Networks, 5th Workshop on Multithreaded Architectures and Applications (MTAAP), Anchorage, AK, May 20, D. Mizell, D.A. Bader, E.L. Goodman, and D.J. Haglin, Semantic Databases and Supercomputers, 2011 Semantic Technology Conference (SemTech), San Francisco, CA, June 5-9, E.J. Riedy, H. Meyerhenke, D. Ediger, and D.A. Bader, Parallel Community Detection for Massive Graphs, The 9th International Conference on Parallel Processing and Applied Mathematics (PPAM 2011), Torun, Poland, September 11-14, Lecture Notes in Computer Science, 7203: , E.J. Riedy, D. Ediger, D.A. Bader, and H. Meyerhenke, Parallel Community Detection for Massive Graphs, 10th DIMACS Implementation Challenge -- Graph Partitioning and Graph Clustering, Atlanta, GA, February 13-14, E.J. Riedy, H. Meyerhenke, D.A. Bader, D. Ediger, and T. Mattson, Analysis of Streaming Social Networks and Graphs on Multicore Architectures, The 37th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, March 25-30, J. Riedy, H. Meyerhenke, and D.A. Bader, Scalable Multi-threaded Community Detection in Social Networks, 6th Workshop on Multithreaded Architectures and Applications (MTAAP), Shanghai, China, May 25, P. Pande and D.A. Bader, Computing Betweenness Centrality for Small World Networks on a GPU, The 15th Annual High Performance Embedded Computing Workshop (HPEC), Lexington, MA, September 21-22, H. Meyerhenke, E.J. Riedy, and D.A. Bader, Parallel Community Detection in Streaming Graphs, Minisymposium on Parallel Analysis of Massive Social Networks, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, D. Ediger, E.J. Riedy, H. Meyerhenke, and D.A. Bader, Analyzing Massive Networks with GraphCT, Poster Session, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, R.C. McColl, D. Ediger, and D.A. Bader, Many-Core Memory Hierarchies and Parallel Graph Analysis, Poster Session, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, E.J. Riedy, D. Ediger, H. Meyerhenke, and D.A. Bader, STING: Software for Analysis of Spatio-Temporal Interaction Networks and Graphs, Poster Session, 15th SIAM Conference on Parallel Processing for Scientific Computing (PP12), Savannah, GA, February 15-17, David A. Bader, Christine Heitsch, and Kamesh Madduri, Large-Scale Network Analysis, in J. Kepner and J. Gilbert, editor, Graph Algorithms in the Language of Linear Algebra, SIAM Press, Chapter 12, pages , Jeremy Kepner, David A. Bader, Robert Bond, Nadya Bliss, Christos Faloutsos, Bruce Hendrickson, John Gilbert, and Eric Robinson, Fundamental Questions in the Analysis of Large Graphs, in J. Kepner and J. Gilbert, editor, Graph Algorithms in the Language of Linear Algebra, SIAM Press, Chapter 16, pages , David A. Bader 18
19 Acknowledgment of Support David A. Bader 19
20 SIAM CSE13 Frontiers in Large-Scale Graph Analysis 2:00-2:25 Applications and challenges in large-scale graph analysis David A. Bader and Jason Riedy, Georgia Institute of Technology, USA; Henning Meyerhenke, Karlsruhe Institute of Technology, Germany 2:30-2:55 Large scale graph analytics and randomized algorithms for applications in cybersecurity John Johnson, Pacific Northwest National Laboratory, USA 3:00-3:25 Anomaly Detection in Very Large Graphs: Modeling and Computational Considerations Benjamin Miller, Nicholas Arcolano, Edward Rutledge, Matthew Schmidt, and Nadya Bliss, Massachusetts Institute of Technology, USA 3:30-3:55 Combinatorial and Numerical Algorithms for Network Analysis Henning Meyerhenke and Christian Staudt, Karlsruhe Institute of Technology, Germany Part II (MS179 ) Thursday, February 28, 9:30 AM - 11:30 AM 9:30-9:55 Are we there yet? When to stop a Markov chain while generating random graphs? Ali Pinar, Jaideep Ray, and C. Seshadhri, Sandia National Laboratories, USA 10:00-10:25 Analyzing graph structure in streaming data with STINGER Jason Riedy, David A. Bader, Robert C. Mccoll, and David Ediger, Georgia Institute of Technology, USA 10:30-10:55 High-Performance Filtered Queries in Attributed Semantic Graphs John R. Gilbert, University of California, Santa Barbara, USA; Aydin Buluc, Lawrence Berkeley National Laboratory, USA; Armando Fox, University of California, Berkeley, USA; Shoaib Kamil, Massachusetts Institute of Technology, USA; Adam Lugowski, University of California, Santa Barbara, USA; Leonid Oliker and Samuel Williams, Lawrence Berkeley National Laboratory, USA 11:00-11:25 Large-Scale Graph-Structured Machine Learning: GraphLab in the Cloud and GraphChi in your PC Joseph Gonzalez and Carlos Guestrin, Carnegie Mellon University, USA David A. Bader 20
Massive-Scale Streaming Analytics. David A. Bader
Massive-Scale Streaming Analytics David A. Bader Dr. David A. Bader Full Professor, Computational Science and Engineering Executive Director for High Performance Computing. IEEE Fellow, AAAS Fellow interests
STINGER: High Performance Data Structure for Streaming Graphs
STINGER: High Performance Data Structure for Streaming Graphs David Ediger Rob McColl Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA, USA Abstract The current research focus on
Big Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
Graph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader
A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward
Parallel Algorithms for Small-world Network. David A. Bader and Kamesh Madduri
Parallel Algorithms for Small-world Network Analysis ayssand Partitioning atto g(s (SNAP) David A. Bader and Kamesh Madduri Overview Informatics networks, small-world topology Community Identification/Graph
Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8)
Mission Need Statement for the Next Generation High Performance Production Computing System Project () (Non-major acquisition project) Office of Advanced Scientific Computing Research Office of Science
Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis
, 22-24 October, 2014, San Francisco, USA Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis Teng Zhao, Kai Qian, Dan Lo, Minzhe Guo, Prabir Bhattacharya, Wei Chen, and Ying
graph analysis beyond linear algebra
graph analysis beyond linear algebra E. Jason Riedy DMML, 24 October 2015 HPC Lab, School of Computational Science and Engineering Georgia Institute of Technology motivation and applications (insert prefix
June Zhang (Zhong-Ju Zhang)
(Zhong-Ju Zhang) Carnegie Mellon University Dept. Electrical and Computer Engineering, 5000 Forbes Ave. Pittsburgh, PA 15213 Phone: 678-899-2492 E-Mail: [email protected] http://users.ece.cmu.edu/~junez
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
High Performance Computing
High Parallel Computing Hybrid Program Coding Heterogeneous Program Coding Heterogeneous Parallel Coding Hybrid Parallel Coding High Performance Computing Highly Proficient Coding Highly Parallelized Code
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics
9700 South Cass Avenue, Lemont, IL 60439 URL: www.mcs.anl.gov/ fulin
Fu Lin Contact information Education Work experience Research interests Mathematics and Computer Science Division Phone: (630) 252-0973 Argonne National Laboratory E-mail: [email protected] 9700 South
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED
Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS
Barry Bolding, Ph.D. VP, Cray Product Division
Barry Bolding, Ph.D. VP, Cray Product Division 1 Corporate Overview Trends in Supercomputing Types of Supercomputing and Cray s Approach The Cloud The Exascale Challenge Conclusion 2 Slide 3 Seymour Cray
Sanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 [email protected] 1. Introduction The field of data mining and knowledgee discovery is emerging as a
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
BSPCloud: A Hybrid Programming Library for Cloud Computing *
BSPCloud: A Hybrid Programming Library for Cloud Computing * Xiaodong Liu, Weiqin Tong and Yan Hou Department of Computer Engineering and Science Shanghai University, Shanghai, China [email protected],
Big Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy [email protected] Data Availability or Data Deluge? Some decades
Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm
Image Analytics on Big Data In Motion Implementation of Image Analytics CCL in Apache Kafka and Storm Lokesh Babu Rao 1 C. Elayaraja 2 1PG Student, Dept. of ECE, Dhaanish Ahmed College of Engineering,
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies
Text Mining Approach for Big Data Analysis Using Clustering and Classification Methodologies Somesh S Chavadi 1, Dr. Asha T 2 1 PG Student, 2 Professor, Department of Computer Science and Engineering,
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Doctor of Philosophy in Computer Science
Doctor of Philosophy in Computer Science Background/Rationale The program aims to develop computer scientists who are armed with methods, tools and techniques from both theoretical and systems aspects
Trends in High-Performance Computing for Power Grid Applications
Trends in High-Performance Computing for Power Grid Applications Franz Franchetti ECE, Carnegie Mellon University www.spiral.net Co-Founder, SpiralGen www.spiralgen.com This talk presents my personal views
Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Multicore Processors
Map-Parallel Scheduling (mps) using Hadoop environment for job scheduler and time span for Sudarsanam P Abstract G. Singaravel Parallel computing is an base mechanism for data process with scheduling task,
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto
Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis
Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis Kamesh Madduri Computational Research Division Lawrence Berkeley National Laboratory Berkeley, USA
Concept and Project Objectives
3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the
Introduction to Cloud Computing
Introduction to Cloud Computing Parallel Processing I 15 319, spring 2010 7 th Lecture, Feb 2 nd Majd F. Sakr Lecture Motivation Concurrency and why? Different flavors of parallel computing Get the basic
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
A Comparison of General Approaches to Multiprocessor Scheduling
A Comparison of General Approaches to Multiprocessor Scheduling Jing-Chiou Liou AT&T Laboratories Middletown, NJ 0778, USA [email protected] Michael A. Palis Department of Computer Science Rutgers University
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers
Unleashing the Performance Potential of GPUs for Atmospheric Dynamic Solvers Haohuan Fu [email protected] High Performance Geo-Computing (HPGC) Group Center for Earth System Science Tsinghua University
Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence
Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.
3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India
3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
Mining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
A Framework of User-Driven Data Analytics in the Cloud for Course Management
A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment
A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment Panagiotis D. Michailidis and Konstantinos G. Margaritis Parallel and Distributed
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
Visualizing Large, Complex Data
Visualizing Large, Complex Data Outline Visualizing Large Scientific Simulation Data Importance-driven visualization Multidimensional filtering Visualizing Large Networks A layout method Filtering methods
Accelerating In-Memory Graph Database traversal using GPGPUS
Accelerating In-Memory Graph Database traversal using GPGPUS Ashwin Raghav Mohan Ganesh University of Virginia High Performance Computing Laboratory [email protected] Abstract The paper aims to provide
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
Stream Processing on GPUs Using Distributed Multimedia Middleware
Stream Processing on GPUs Using Distributed Multimedia Middleware Michael Repplinger 1,2, and Philipp Slusallek 1,2 1 Computer Graphics Lab, Saarland University, Saarbrücken, Germany 2 German Research
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks
Dynamic Network Analyzer Building a Framework for the Graph-theoretic Analysis of Dynamic Networks Benjamin Schiller and Thorsten Strufe P2P Networks - TU Darmstadt [schiller, strufe][at]cs.tu-darmstadt.de
Load Balancing on a Grid Using Data Characteristics
Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu
A Case Study - Scaling Legacy Code on Next Generation Platforms
Available online at www.sciencedirect.com ScienceDirect Procedia Engineering 00 (2015) 000 000 www.elsevier.com/locate/procedia 24th International Meshing Roundtable (IMR24) A Case Study - Scaling Legacy
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki [email protected] http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
Research Statement Joseph E. Gonzalez [email protected]
As we scale to increasingly parallel and distributed architectures and explore new algorithms and machine learning techniques, the fundamental computational models and abstractions that once separated
Big Data Executive Survey
Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the
Parallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA
SEMANTIC WEB BASED INFERENCE MODEL FOR LARGE SCALE ONTOLOGIES FROM BIG DATA J.RAVI RAJESH PG Scholar Rajalakshmi engineering college Thandalam, Chennai. [email protected] Mrs.
ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS
CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant
Big Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
Big Data Analytics. Chances and Challenges. Volker Markl
Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
Cloud Storage Solution for WSN Based on Internet Innovation Union
Cloud Storage Solution for WSN Based on Internet Innovation Union Tongrang Fan 1, Xuan Zhang 1, Feng Gao 1 1 School of Information Science and Technology, Shijiazhuang Tiedao University, Shijiazhuang,
High Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University
High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geo-crowdsourcing:OpenStreetMap Remote
IC05 Introduction on Networks &Visualization Nov. 2009. <[email protected]>
IC05 Introduction on Networks &Visualization Nov. 2009 Overview 1. Networks Introduction Networks across disciplines Properties Models 2. Visualization InfoVis Data exploration
Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee
Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee Technology in Pedagogy, No. 8, April 2012 Written by Kiruthika Ragupathi ([email protected]) Computational thinking is an emerging
Big Data Management in the Clouds and HPC Systems
Big Data Management in the Clouds and HPC Systems Hemera Final Evaluation Paris 17 th December 2014 Shadi Ibrahim [email protected] Era of Big Data! Source: CNRS Magazine 2013 2 Era of Big Data! Source:
Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social
GeoGrid Project and Experiences with Hadoop
GeoGrid Project and Experiences with Hadoop Gong Zhang and Ling Liu Distributed Data Intensive Systems Lab (DiSL) Center for Experimental Computer Systems Research (CERCS) Georgia Institute of Technology
LDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany [email protected],
HPC Programming Framework Research Team
HPC Programming Framework Research Team 1. Team Members Naoya Maruyama (Team Leader) Motohiko Matsuda (Research Scientist) Soichiro Suzuki (Technical Staff) Mohamed Wahib (Postdoctoral Researcher) Shinichiro
Bob Boothe. Education. Research Interests. Teaching Experience
Bob Boothe Computer Science Dept. University of Southern Maine 96 Falmouth St. P.O. Box 9300 Portland, ME 04103--9300 (207) 780-4789 email: [email protected] 54 Cottage Park Rd. Portland, ME 04103 (207)
Big Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
