urika! Unlocking the Power of Big Data at PSC
|
|
- Patricia Reeves
- 8 years ago
- Views:
Transcription
1 urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, Pittsburgh Supercomputing Center
2 Big Data is Everywhere 2013 Pittsburgh Supercomputing Center 2
3 Some Important Types of Big Data Pan-STARRS telescope NOAA climate modeling Web graph of major Internet routers, Ross Richardson and Fan Chung Graham Genome sequencers (Wikipedia Commons) Cosmological simulations of black hole formation, Tiziana Di Matteo Human interactome (Wikipedia Commons) structured, regular, mostly independent 2013 Pittsburgh Supercomputing Center structured, regular, coupled or independent 3 unstructured, irregular, nonpartitionable
4 Graphs Represent Sets of Complex Relationships A graph is a set of vertices (or nodes) and edges Edges can be directed a e b f d c 2013 Pittsburgh Supercomputing Center 4
5 An Example: the WMATA Map Metro stops are the nodes, and the routes between them are the edges. There are 6 lines (red, orange, yellow, blue, green, silver), which can also be expressed as 6 graphs having some nodes in common. Distances between stops, if expressed, could make this a weighted graph Pittsburgh Supercomputing Center 5
6 Another Example: from NELL The Never Ending Language Learner (William Cohen and Tom Mitchell, CMU), graphs are semantic networks of beliefs, with nodes representing entities and edges representing their relationships Concept: Person type Concept: Politician type type Concept: Bill_Clinton Concept: George_Bush knows 2013 Pittsburgh Supercomputing Center 6
7 Graph Analytics Graph analytics is the science of solving problems using data expressed as graphs. Examples: Revealing patterns and relationships Community detection Anomaly detection Classification (e.g. via structural characterization) Inferencing 2013 Pittsburgh Supercomputing Center 7
8 Graph Nonpartitionability A cut is a partition of nodes into two subsets The cut size is the number of edges joining those subsets of nodes V = 8, E = 8, ξ = 2 V = 8, E = 28, ξ = 18 If the cut size is not substantially smaller than the number of edges, the graph is nonpartitionable. Many important graph problems are nonpartitionable Pittsburgh Supercomputing Center 8
9 Nonpartitionable = Challenging! Graph algorithms typically require following edges from vertex to vertex, which is irregular and unpredictable. For a large, nonpartitionable graph, data distribution onto a distributed system requires very frequent communications and remote memory accesses. The order of access is irregular and unpredictable, and the amount of data at each vertex is typically small. Messaging imposes high overhead (but PGAS can help). Caching is ineffective or counterproductive. Hardware prefetching from memory is counterproductive. Large graph problems require a specialized architecture Pittsburgh Supercomputing Center 9
10 Going Back to Our Examples Web graph of major Internet routers, Ross Richardson and Fan Chung Graham Human interactome (Wikipedia Commons) 2013 Pittsburgh Supercomputing Center 10
11 Sherlock: a YarcData urika Data Appliance with PSC Enhancements Graph Analytics Platform urika application architecture Universal RDF Integration Knowledge Appliance 32 Graph Analytics Platform nodes, each containing: 2 Cray Threadstorm 4.0 processors, 128 threads/proc SeaStar 2 ASIC 1 TB globally shared memory can accommodate graphs of up to ~10 billion edges General-purpose XT5 (AMD Opteron) nodes Massive multithreading and sophisticated memory handling for latency hiding Next-generation Cray XMT Remote Memory Access block with Extended Memory Semantics, providing a single, shared address space Enable additional, heterogeneous applications 2013 Pittsburgh Supercomputing Center 11
12 The Sherlock Project Funded by the National Science Foundation through the Strategic Technologies for Cyberinfrastructure (STCI) program An experimental system, intended for the community to gain experience with urika, develop new kinds of applications, and pursue cutting-edge research For additional information, see Pittsburgh Supercomputing Center 12
13 urika: Enabling Productive Graph Analytics Leverage emerging Web 3.0 standards The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. 1 Resource Description Framework (RDF) SPARQL Protocol and RDF Query Language Application-specific GUIs for user interaction This is general framework that s useful for much more than the Semantic Web. 1. W3C Semantic Web Activity. World Wide Web Consortium (W3C). November 7, Pittsburgh Supercomputing Center 13
14 RDF: Resource Description Framework Represents data as directed, labeled graphs Unifies data and metadata Although originally developed for the Web, the format is general Format: Triple = subject, predicate, object (or resource, property, value) Subject : URI or blank node Predicate : URI Object : URI, blank node, or literal urika Quad = triple + named graph This very general format allows fusing data from multiple sources There are tools for converting data to RDF from RDBMS, Hadoop, etc. But ontologies must either be the same or mapped onto each other There are tools for this, too Pittsburgh Supercomputing Center 14
15 SPARQL SPARQL: SPARQL Protocol and RDF Query Language contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions supports extensible value testing and constraining queries by source RDF graph query results can be RDF graphs or simpler result sets is a convenient resource for experimenting with SPARQL queries Pittsburgh Supercomputing Center 15
16 SPARQL Example 1 Query: PREFIX rdfs: < PREFIX type: < PREFIX prop: < SELECT?country_name?population WHERE {?country a type:landlockedcountries ; rdfs:label?country_name ; prop:populationestimate?population. FILTER (?population > && langmatches(lang(?country_name),"en") ). } 2013 Pittsburgh Supercomputing Center 16
17 SPARQL Example 1 (2) Result: country_name "Kazakhstan"@en "Uzbekistan"@en "Afghanistan"@en "Ethiopia"@en "Burkina Faso"@en "Nepal"@en "Niger"@en population " "^^< " "^^< " "^^< " "^^< " "^^< " "^^< " "^^< Pittsburgh Supercomputing Center 17
18 A Semantic Web Approach to Graph Analytics Unstructured data: sparse, irregular, no need to define schema a priori Express relationships as subject predicate object Analyzing data with relationships allows powerful inferencing to discover information that s buried in the data Applications: Medicine, biology, finance, sociology, economics, machine learning, epidemiology, security, business, networking, Graphical representation of part of the human metabolic network, focusing on genes in the skeletal muscles of diabetes patients. From Kiran Raosaheb Patil, Systems biology: Looking beyond the genome, Pittsburgh Supercomputing Center 18
19 Opening New Directions in Research = Complex analytics with urika + Structural characterization, dynamics, New communities Reaching (again) the long tail of research New scales of complex analysis Real-world problems at scale New types of applications Focus on actionable insights from data 2013 Pittsburgh Supercomputing Center 19
20 Using Sherlock: 3 Paradigms New communities, New scales of analysis, New types of applications As a urika data appliance RDF, SPARQL, SNORQL, GUIs Extremely powerful for complex queries Support for named graphs provides additional capability Using the Graph Analytics Platform for standalone applications Heterogeneous XT5 / Graph Analytics Platform applications Compute nodes: C or C++, with XMT extensions for memory, synchronization, etc. System: Java, Fortran, etc. on general-purpose XT5 nodes Homogeneous (NG-XMT) and heterogeneous programming (XT5 + NG-XMT) to support a wide range of applications and to ease porting Pittsburgh Supercomputing Center 20
21 Anticipated Research (Examples 1) Identification of genes and pathways leading to tumors Real-time social network analysis: recognizing misinformation and improving disaster response Never-Ending Language Learning (NELL) Agent-based epidemiological modeling Anomaly detection, feature extraction, and analysis of large, time-evolving graphs Large-scale, high-dimension analysis Detecting anomalous subgraphs, e.g. disease outbreaks 2013 Pittsburgh Supercomputing Center 21
22 Anticipated Research (Examples 2) Cluster finding in astrophysics Understanding community dynamics across different types of social networks Data stream management systems and annotation management Generating sparse signal processing and graph kernels Extracting subpopulations from large social networks Genome sequence assembly 2013 Pittsburgh Supercomputing Center 22
23 Enabling Breakthrough Research with Big Data Innovative hardware and software architectures allows researchers to tackle graph datasets at full scale. NSF users will increasingly exploit graph analytics as part of their data-intensive research. This is complementary to other aspects of HPC and big data. PSC and XSEDE can help researchers in several ways: Disseminating information about resources Establishing best practices for XSEDE and other NSF platforms Assisting with data modeling Helping to develop workflows for exploratory and routine analytics Scaling and optimizing graph applications and algorithms Extending graph analytic capabilities, e.g. probabilistic predicates, reification 2013 Pittsburgh Supercomputing Center 23
24 Thank You 2013 Pittsburgh Supercomputing Center 24
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationYarcData urika Technical White Paper
YarcData urika Technical White Paper 2012 Cray Inc. All rights reserved. Specifications subject to change without notice. Cray is a registered trademark, YarcData, urika and Threadstorm are trademarks
More informationIntroduction to urika. Multithreading. urika Appliance. SPARQL Database. Use Cases
1 Introduction to urika Multithreading urika Appliance SPARQL Database Use Cases 2 Gain business insight by discovering unknown relationships in big data Graph analytics warehouse supports ad hoc queries,
More informationThe Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO
The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray
More informationBig Data, Fast Data, Complex Data. Jans Aasman Franz Inc
Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationConnecting Researchers, Data & HPC
Connecting Researchers, Data & HPC Nick Nystrom Director, Strategic Applications & Bridges PI nystrom@psc.edu July 1, 2015 2015 Pittsburgh Supercomputing Center The Shift to Big Data New Emphases Pan-STARRS
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationSome Research Challenges for Big Data Analytics of Intelligent Security
Some Research Challenges for Big Data Analytics of Intelligent Security Yuh-Jong Hu hu at cs.nccu.edu.tw Emerging Network Technology (ENT) Lab. Department of Computer Science National Chengchi University,
More informationUsing Big Data in Healthcare
Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? David R. Holmes III, PhD Mayo Clinic College of Medicine Rochester, MN, USA Using Big Data in Healthcare
More informationChing-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
More informationSupercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?
HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data
More informationThe Fusion of Supercomputing and Big Data: The Role of Global Memory Architectures in Future Large Scale Data Analytics
HPC 2014 High Performance Computing FROM clouds and BIG DATA to EXASCALE AND BEYOND An International Advanced Workshop July 7 11, 2014, Cetraro, Italy Session III Emerging Systems and Solutions The Fusion
More informationGraph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationLDIF - Linked Data Integration Framework
LDIF - Linked Data Integration Framework Andreas Schultz 1, Andrea Matteini 2, Robert Isele 1, Christian Bizer 1, and Christian Becker 2 1. Web-based Systems Group, Freie Universität Berlin, Germany a.schultz@fu-berlin.de,
More information! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationStorage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
More informationBig Data and Graph Analytics in a Health Care Setting
Big Data and Graph Analytics in a Health Care Setting Supercomputing 12 November 15, 2012 Bob Techentin Mayo Clinic SPPDG Archive 43738-1 Archive 43738-2 What is the Mayo Clinic? Mayo Clinic Mission: To
More informationA Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud. Tejas Bharat Thorat Prof.RanjanaR.Badre Computer Engineering Department Computer
More informationA GPU-Enabled HPC System for New Communities and Data Analytics
A GPU-Enabled HPC System for New Communities and Data Analytics Nick Nystrom Director, Strategic Applications & Bridges PI nystrom@psc.edu HPE Theater Presentation November 19, 2015 2015 Pittsburgh Supercomputing
More informationStatistical Analysis and Visualization for Cyber Security
Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference
More informationSix Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study
Six Days in the Network Security Trenches at SC14 A Cray Graph Analytics Case Study WP-NetworkSecurity-0315 www.cray.com Table of Contents Introduction... 3 Analytics Mission and Source Data... 3 Analytics
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationBig RDF Data Partitioning and Processing using hadoop in Cloud
Big RDF Data Partitioning and Processing using hadoop in Cloud Tejas Bharat Thorat Dept. of Computer Engineering MIT Academy of Engineering, Alandi, Pune, India Prof.Ranjana R.Badre Dept. of Computer Engineering
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationData Grids. Lidan Wang April 5, 2007
Data Grids Lidan Wang April 5, 2007 Outline Data-intensive applications Challenges in data access, integration and management in Grid setting Grid services for these data-intensive application Architectural
More informationUIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications
UIMA and WebContent: Complementary Frameworks for Building Semantic Web Applications Gaël de Chalendar CEA LIST F-92265 Fontenay aux Roses Gael.de-Chalendar@cea.fr 1 Introduction The main data sources
More informationexcellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own
Steve Reinhardt 2 The urika developers are extending SPARQL s excellent graph matching capabilities with global graph analytic operations, via an interface that researchers can use to plug in their own
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationGraph Analytics in Big Data. John Feo Pacific Northwest National Laboratory
Graph Analytics in Big Data John Feo Pacific Northwest National Laboratory 1 A changing World The breadth of problems requiring graph analytics is growing rapidly Large Network Systems Social Networks
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationTraditional Analytics: Not Designed to Excel at Graph Analytics
I D C T E C H N O L O G Y S P O T L I G H T F i n d i n g H i g h - V a l u e R elationships in Big Data May 2013 Adapted from Worldwide Data Intensive Focused HPC Server Systems 2011 2015 Forecast by
More informationThe Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets
The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets!! Large data collections appear in many scientific domains like climate studies.!! Users and
More informationWhite Paper The Numascale Solution: Extreme BIG DATA Computing
White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad ABOUT THE AUTHOR Einar Rustad is CTO of Numascale and has a background as CPU, Computer Systems and HPC Systems De-signer
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationSubgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro
Subgraph Patterns: Network Motifs and Graphlets Pedro Ribeiro Analyzing Complex Networks We have been talking about extracting information from networks Some possible tasks: General Patterns Ex: scale-free,
More informationTechnical White Paper. October 2014. Real-Time Discovery in Big Data Using the Urika-GD. Appliance G OVERN M ENT. www.cray.com
LIFE SCIENCES Technical White Paper Real-Time Discovery in Big Data Using the Urika-GD Appliance SPORTS ANALYTICS FRAU D SCIENTIFIC RESEARCH CYBERSECURITY G OVERN M ENT TELECOMMUNICATIONS CUSTOM ER INSIG
More informationBig Data and Healthcare Payers WHITE PAPER
Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationBig Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD
Big Data Management Assessed Coursework Two Big Data vs Semantic Web F21BD Boris Mocialov (H00180016) MSc Software Engineering Heriot-Watt University, Edinburgh April 5, 2015 1 1 Introduction The purpose
More informationXSEDE Data Analytics Use Cases
XSEDE Data Analytics Use Cases 14th Jun 2013 Version 0.3 XSEDE Data Analytics Use Cases Page 1 Table of Contents A. Document History B. Document Scope C. Data Analytics Use Cases XSEDE Data Analytics Use
More informationSemantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo
DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo Expected Outcomes You will learn: Basic concepts related to ontologies Semantic model Semantic web Basic features of RDF and RDF
More information> Semantic Web Use Cases and Case Studies
> Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection Robert Stanley 1, Bruce McManus 2, Raymond
More informationSTINGER: High Performance Data Structure for Streaming Graphs
STINGER: High Performance Data Structure for Streaming Graphs David Ediger Rob McColl Jason Riedy David A. Bader Georgia Institute of Technology Atlanta, GA, USA Abstract The current research focus on
More informationABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu
Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation
More informationfédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries
fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE
More informationPresented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My
Presented by: Aaron Bossert, Cray Inc. Network Security Analytics, HPC Platforms, Hadoop, and Graphs Oh, My The Proverbial Needle In A Haystack Problem The Nuclear Option Problem Statement and Proposed
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationnumascale White Paper The Numascale Solution: Extreme BIG DATA Computing Hardware Accellerated Data Intensive Computing By: Einar Rustad ABSTRACT
numascale Hardware Accellerated Data Intensive Computing White Paper The Numascale Solution: Extreme BIG DATA Computing By: Einar Rustad www.numascale.com Supemicro delivers 108 node system with Numascale
More informationDISCOVERING RESUME INFORMATION USING LINKED DATA
DISCOVERING RESUME INFORMATION USING LINKED DATA Ujjal Marjit 1, Kumar Sharma 2 and Utpal Biswas 3 1 C.I.R.M, University Kalyani, Kalyani (West Bengal) India sic@klyuniv.ac.in 2 Department of Computer
More information1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto
More informationIndustry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationHadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
More informationThe Best Way to Get BIG DATA is By Starting Small
The Best Way to Get BIG DATA is By Starting Small Dr. Brand Niemann Director and Senior Data Scientist Semantic Community for Johns Hopkins University School of Medicine and Modus Operandi http://semanticommunity.info/
More informationWhite Paper The Numascale Solution: Affordable BIG DATA Computing
White Paper The Numascale Solution: Affordable BIG DATA Computing By: John Russel PRODUCED BY: Tabor Custom Publishing IN CONJUNCTION WITH: ABSTRACT Big Data applications once limited to a few exotic disciplines
More informationBigdata : Enabling the Semantic Web at Web Scale
Bigdata : Enabling the Semantic Web at Web Scale Presentation outline What is big data? Bigdata Architecture Bigdata RDF Database Performance Roadmap What is big data? Big data is a new way of thinking
More informationBSC vision on Big Data and extreme scale computing
BSC vision on Big Data and extreme scale computing Jesus Labarta, Eduard Ayguade,, Fabrizio Gagliardi, Rosa M. Badia, Toni Cortes, Jordi Torres, Adrian Cristal, Osman Unsal, David Carrera, Yolanda Becerra,
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationPipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationUsing an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
More informationAchieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging
Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationJOURNAL OF COMPUTER SCIENCE AND ENGINEERING
Exploration on Service Matching Methodology Based On Description Logic using Similarity Performance Parameters K.Jayasri Final Year Student IFET College of engineering nishajayasri@gmail.com R.Rajmohan
More informationHow To Make Sense Of Data With Altilia
HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to
More informationDisributed Query Processing KGRAM - Search Engine TOP 10
fédération de données et de ConnaissancEs Distribuées en Imagerie BiomédicaLE Data fusion, semantic alignment, distributed queries Johan Montagnat CNRS, I3S lab, Modalis team on behalf of the CrEDIBLE
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationWe have big data, but we need big knowledge
We have big data, but we need big knowledge Weaving surveys into the semantic web ASC Big Data Conference September 26 th 2014 So much knowledge, so little time 1 3 takeaways What are linked data and the
More informationScaling Objectivity Database Performance with Panasas Scale-Out NAS Storage
White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage
More informationA Professional Big Data Master s Program to train Computational Specialists
A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationBeyond Embarrassingly Parallel Big Data. William Gropp www.cs.illinois.edu/~wgropp
Beyond Embarrassingly Parallel Big Data William Gropp www.cs.illinois.edu/~wgropp Messages Big is big Data driven is an important area, but not all data driven problems are big data (despite current hype).
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationSYSTAP / bigdata. Open Source High Performance Highly Available. 1 http://www.bigdata.com/blog. bigdata Presented to CSHALS 2/27/2014
SYSTAP / Open Source High Performance Highly Available 1 SYSTAP, LLC Small Business, Founded 2006 100% Employee Owned Customers OEMs and VARs Government TelecommunicaHons Health Care Network Storage Finance
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationHPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk
HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training
More informationA Business Process Services Portal
A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More information3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India
3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India Call for Papers Cloud computing has emerged as a de facto computing
More informationIn-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
More informationA Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel
A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated
More informationHPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange benoit.lange@inria.fr Toàn Nguyên toan.nguyen@inria.fr
More informationPentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationHadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationThe data forest. Application. Application Application DATA. Office of Research
The data forest DATA Unfortunately Data to the rescue The Rensselaer IDEA HPC: Computational Science and Engineering + Data Science and Predictive Analytics + Cognitive Computing + Perceptualization DATA
More informationYarcData's urika Shows Big Data Is More Than Hadoop and Data Warehouses
G00232737 YarcData's urika Shows Big Data Is More Than Hadoop and Data Warehouses Published: 11 September 2012 Analyst(s): Carl Claunch The hype about big data is mostly on Hadoop or data warehouses, but
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationBuilding a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
More informationThe University of Jordan
The University of Jordan Master in Web Intelligence Non Thesis Department of Business Information Technology King Abdullah II School for Information Technology The University of Jordan 1 STUDY PLAN MASTER'S
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More information