4th Workshop on Big Data Benchmarking
|
|
- Daisy Morgan
- 8 years ago
- Views:
Transcription
1 4th Workshop on Big Data Benchmarking
2 4th WBDB: Welcome and Introduction Chaitan Baru Associate Director, Data Initiatives San Diego Supercomputer Center Director, Center for Large-scale Data Systems Research University of California San Diego
3 3 Thanks! Brocade: Providing the venue+catering Sheri Mukai; Michele Limbocker; Suresh Vobillisetty CLDS sponsors: Pivotal, Intel, NetApp, Seagate CLDS Organizing Committee Speakers/attendees Springer-Verlag
4 4 CLDS: Center for Large-scale Data Systems Research R&D activity within San Diego Supercomputer Center Current projects/activities Big Data Benchmarking Opportunity to work with CS graduate students Data Value How Much Information CSE Master of Advanced Studies (MAS) in Big Data Science SDSC Data Science Institute Initiative focused on onsite education and training in Data Science for industry
5 5 SDSC A national and UC-based center for highperformance computing and data-intensive computing (big data) Established >25 years ago Engaged in Research + Development + Production (RDP) Offers datacenter services to UC, also non-uc and industry partners
6 Comet: System Characteristics Planned for Jan 2015 Total flops ~ PF Dell primary integrator Intel processors Mellanox InfiniBand Aeon storage vendor Standard compute nodes Intel next-gen processors 128 GB DRAM 320GB SSD Large-memory nodes 1.5TB DRAM GPU nodes Hybrid fat-tree topology FDR InfiniBand Rack-level full bisection bandwidth (72 nodes) 4:1 oversubscription cross-rack Performance Storage 7 PB, 200 GB/s Scratch & Persistent Storage Durable Storage (reliability) 6 PB disk Gateway hosting nodes and VM image repository 100 Gbps external connectivity
7 7 WBDB Background Genesis of this effort NSF Cluster Exploratory (CluE) research project On Performance Evaluation of On-Demand Provisioning of Data Intensive Applications ( ) Led to a study of benchmarks to compare Hadoop and relational DBMS Launched Workshops on Big Data Benchmarking Funded by NSF and industry sponsorships 1 st WBDB: May 2012, San Jose. Hosted by Brocade 2 nd WBDB: December 2012, Pune, India. Hosted by Persistent Systems / Infosys 3 rd WBDB: July 2013, Xi an, China. Hosted by Xi an University ~130 attendees (including duplicates) + ~40 today
8 8 1 st WBDB Attendee Organizations Actian AMD BMMsoft Brocade CA Labs Cisco Cloudera Convey Computer CWI/Monet Dell EPFL Facebook Google Greenplum Hewlett-Packard Hortonworks Indiana Univ / Hathitrust Research Foundation InfoSizing Intel LinkedIn MapR/Mahout Mellanox Microsoft NSF NetApp NetApp/OpenSFS Oracle San Diego Supercomputer Center SAS Scripps Research Institute Seagate Shell SNIA Teradata Corporation Twitter UC Irvine Univ. of Minnesota Univ. of Toronto Univ. of Washington VMware WhamCloud Yahoo! Red Hat
9 9 4th WBDB: 3rd WBDB:
10 10 WBDB Outcomes Big Data Benchmarking Community (BDBC) mailing list (~160 members from ~75 organizations) (Remote) Talks every other Thursday at 9AM US Pacific time Selected papers to be published in Springer Verlag LNCS: 2012 and 2013 Issues Paper from First Workshop Setting the Direction for Big Data Benchmark Standards by C. Baru, M. Bhandarkar, R. Nambiar, M. Poess, and T. Rabl, published in Selected Topics in Performance Evaluation and Benchmarking, Springer-Verlag Article in inaugural issue of Big Data Journal Big Data Benchmarking and the Big Data Top100 List by Baru, Bhandarkar, Nambiar, Poess, Rabl, Big Data Journal, Vol.1, No.1, 60-64, Anne Liebert Publications. Formation of the TPC-BD Subcommittee on BigData benchmarking
11 11 Current Status: Issues Discussed at the Workshops Different types of benchmarks for different aspects of a system Micro-benchmarks. Specific lower-level, system operations I/O operations, e.g. A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters, Panda et al, OSU Functional benchmarks Terasort Basic SQL: Individual SQL operations, e.g. Select, Project, Join, Order-By, Genre-specific benchmarks E.g. Graph500 Application-level benchmarks Measure system-level performance of hardware and software, for a given dataset and workload (a given application scenario) E.g., TPC benchmarks: TPC-C, TPC-H, TPC-DS,
12 Benchmark Design Issues Audience: Who is the audience for such a benchmark? Marketing (Customers / End users), Internal Use (Engineering), Academic Use Application: What is the application that should be modeled? Abstractions of a data pipeline, e.g. Internet-scale business Should the benchmark be for innovation or competition? Successful competitive benchmarks will be used for innovation
13 13 Design Issues - 2 Single benchmark specification: Is it possible to develop a single benchmark to capture characteristics of multiple applications? Single, multi-step benchmark, with plausible end-to-end scenario Component vs. end-to-end benchmark. Is it possible to factor out a set of benchmark components, which can be isolated and plugged into an end-to-end benchmark? The benchmark should consist of individual components that ultimately make up an end-to-end benchmark
14 Design Issues - 3 Paper and Pencil vs Implementation-based. Should the implementation be specification-driven or implementation-driven? Start with an implementation and develop specification at the same time Reuse. Can we reuse existing benchmarks? Leverage existing work and built-up knowledgebase Benchmark Data. Where do we get the data from? Synthetic data generation: structured, non-structured data Verifiability. Should there be a process for verification of results? YES!
15 15 Abstractions of the Big Data World from WBDB Enterprise Warehouse + Agglomeration of other data Structured enterprise data warehouse Extended to incorporate data from other non-fully structured data sources (e.g. weblogs, text, streams) Pool of data with sequence of processing Enterprise data processing as a pipeline from data ingestion to transformation, extraction, subsetting, machine learning, predictive analytics Data from multiple structured and non-structured sources
16 16 Proposal 1: BigBench Ghazal et al: Teradata, Oracle, U.of Toronto, InfoSizing Derived from TPC-Decision Support (TPC-DS) Multiple snowflake schemas with shared dimensions 24 tables with an average of 18 columns 99 distinct SQL 99 queries with random substitutions More representative skewed database content Sub-linear scaling of non-fact tables Ad-hoc, reporting, iterative and extraction queries ETL-like data maintenance
17 17 BigBench Data Model Workload = Set of queries On structured, semistructured, unstructured data Data mining, ML Paper published in ACM SIGMOD Full specification to appear in WBDB2012 publication
18 18 Proposal 2: Deep Analytics Pipeline An end-to-end data processing pipline: Data from multiple sources Loose, flexible schema Data requires structuring ELT rather than ETL Application characteristics Processing pipelines Running models with data Acquisition/ Recording Extraction/ Cleaning/ Annotation Integration/ Aggregation/ Representation Analysis/ Modeling Interpretation
19 19 Example of an Application: User Modeling Objective: Determine user interests by mining user activities Large dimensionality of possible user activities Typical user has sparse activity vector Event attributes change over time
20 20 User Modeling Pipeline Data Acquisition Sessionization Feature and Target Generation Model Training Offline Scoring & Evaluation Batch Scoring & Upload to serving
21 21 Next Steps TPC-BD subcommittee Join TPC if you want to influence that process BigData Top100 List An open, community effort to rank systems by performance (with price/performance) on Big Data workloads HPC meets enterprise : Combine ideas from TPC and Top500 TPC has influenced design and efficiency of DBMSs over 25 years Borrow ranking concept from Top500 But, include price/performance and green metrics
22 22 Next Steps: BigData Community Challenges Challenges related to the Deep Analytics Pipeline Definition of each step Ideas for machine learning and predictive analytics steps Ideas for metrics: performance and price/ performance Announce competitions via Kaggle and other venues
23 23 5 th WBDB Would like to host it in Europe Germany? around Summer 2014 Looking for interested hosts, sponsors, local organizers,
The BigData Top100 List Initiative. Chaitan Baru San Diego Supercomputer Center
The BigData Top100 List Initiative Chaitan Baru San Diego Supercomputer Center 2 Background Workshop series on Big Data Benchmarking (WBDB) First workshop, May 2012, San Jose. Hosted by Brocade. Second
More informationThe BigData Top100 List Initiative. Speakers: Chaitan Baru, San Diego Supercomputer Center, UC San Diego Milind Bhandarkar, Greenplum/EMC
The BigData Top100 List Initiative Speakers: Chaitan Baru, San Diego Supercomputer Center, UC San Diego Milind Bhandarkar, Greenplum/EMC 2 Outline Background Benchmark Context and Technical Issues Next
More informationSetting the Direction for Big Data Benchmark Standards
Setting the Direction for Big Data Benchmark Standards Chaitan Baru, Center for Large-scale Data Systems research (CLDS), San Diego Supercomputer Center, UC San Diego Milind Bhandarkar, Greenplum Raghunath
More informationSetting the Direction for Big Data Benchmark Standards Chaitan Baru, PhD San Diego Supercomputer Center UC San Diego
Setting the Direction for Big Data Benchmark Standards Chaitan Baru, PhD San Diego Supercomputer Center UC San Diego Industry s first workshop on big data benchmarking Acknowledgements National Science
More informationHow To Write A Bigbench Benchmark For A Retailer
BigBench Overview Towards a Comprehensive End-to-End Benchmark for Big Data - bankmark UG (haftungsbeschränkt) 02/04/2015 @ SPEC RG Big Data The BigBench Proposal End to end benchmark Application level
More informationSNW Panel Big Data and Cloud Benchmarking
SNW Panel Big Data and Cloud Benchmarking Panelists: Chaitan Baru, Center for Large-scale Data Systems research (CLDS), San Diego Supercomputer Center, UC San Diego Raghu Nambiar, Strategist, Performance
More informationBENCHMARKING BIG DATA SYSTEMS AND THE BIGDATA TOP100 LIST
BENCHMARKING BIG DATA SYSTEMS AND THE BIGDATA TOP100 LIST ORIGINAL ARTICLE Chaitanya Baru, 1 Milind Bhandarkar, 2 Raghunath Nambiar, 3 Meikel Poess, 4 and Tilmann Rabl 5 Abstract Big data has become a
More informationWelcome to the 6 th Workshop on Big Data Benchmarking
Welcome to the 6 th Workshop on Big Data Benchmarking TILMANN RABL MIDDLEWARE SYSTEMS RESEARCH GROUP DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF TORONTO BANKMARK Please note! This workshop
More informationSetting the Direction for Big Data Benchmark Standards 1
Setting the Direction for Big Data Benchmark Standards 1 Chaitan Baru 1, Milind Bhandarkar 2, Raghunath Nambiar 3, Meikel Poess 4, Tilmann Rabl 5 1 San Diego Supercomputer Center, UC San Diego, USA baru@sdsc.edu
More informationIEEE BigData 2014 Tutorial on Big Data Benchmarking
IEEE BigData 2014 Tutorial on Big Data Benchmarking Dr. Tilmann Rabl Middleware Systems Research Group, University of Toronto tilmann.rabl@utoronto.ca Dr. Chaitan Baru San Diego Supercomputer Center, University
More informationWrangler: A New Generation of Data-intensive Supercomputing. Christopher Jordan, Siva Kulasekaran, Niall Gaffney
Wrangler: A New Generation of Data-intensive Supercomputing Christopher Jordan, Siva Kulasekaran, Niall Gaffney Project Partners Academic partners: TACC Primary system design, deployment, and operations
More informationPACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.
PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics
More informationIndustry Standards for Benchmarking Big Data Systems. Invited Talk Raghunath Nambiar, Cisco
Industry Standards for Benchmarking Big Data Systems Invited Talk Raghunath Nambiar, Cisco About me Cisco Distinguished Engineer, Chief Architect of Big Data Solution Engineering General Chair, TPCTC 2015
More informationVirtualizing Apache Hadoop. June, 2012
June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationHP SN1000E 16 Gb Fibre Channel HBA Evaluation
HP SN1000E 16 Gb Fibre Channel HBA Evaluation Evaluation report prepared under contract with Emulex Executive Summary The computing industry is experiencing an increasing demand for storage performance
More informationMapReduce and Hadoop Distributed File System
MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially
More informationHadoop on the Gordon Data Intensive Cluster
Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,
More informationIntroducing EEMBC Cloud and Big Data Server Benchmarks
Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific
More informationHadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
More informationIndustry Standard for Benchmarking Big Data Systems
Industry Standard for Benchmarking Big Data Systems NIST Big Data Public Working Group IEEE Big Data Workshop October 27, 2014 Raghunath Nambiar Cisco Distinguished Engineer Chief Technologist Big Data
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationAutomating Big Data Benchmarking for Different Architectures with ALOJA
www.bsc.es Jan 2016 Automating Big Data Benchmarking for Different Architectures with ALOJA Nicolas Poggi, Postdoc Researcher Agenda 1. Intro on Hadoop performance 1. Current scenario and problematic 2.
More informationBigBench: Towards an Industry Standard Benchmark for Big DataAnalytics
BigBench: Towards an Industry Standard Benchmark for Big DataAnalytics Ahmad Ghazal 1,5, Tilmann Rabl 2,6, Minqing Hu 1,5, Francois Raab 4,8, Meikel Poess 3,7,AlainCrolotte 1,5,Hans-ArnoJacobsen 2,9 1
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationBig Data Patterns. Ron Bodkin Founder and President, Think Big
Big Data Patterns Ron Bodkin Founder and President, Think Big 1 About Me Ron Bodkin Founder and President, Think Big I have 9 years experience working with Big Data and Hadoop. In 2010, I founded Think
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationThe Greenplum Analytics Workbench
The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationBig Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationHadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)
Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationAppro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
More informationPre-Conference Seminar E: Flash Storage Networking
Pre-Conference Seminar E: Flash Storage Networking Rob Davis, Chris DePuy, Tameesh Suri, Saurabh Sureka, Gunna Marripudi, and Asgeir Eiriksson Santa Clara, CA 1 Agenda Networked Flash Storage Overview
More informationAchieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks
WHITE PAPER July 2014 Achieving Real-Time Business Solutions Using Graph Database Technology and High Performance Networks Contents Executive Summary...2 Background...3 InfiniteGraph...3 High Performance
More informationNoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB
bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de info@bankmark.de T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationComet - High performance virtual clusters to support the long-tail of science.
Comet - High performance virtual clusters to support the long-tail of science. Philip M. Papadopoulos, Ph.D. San Diego Supercomputer Center California Institute for telecommunications and Information Technologies
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationCisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage
Cisco for SAP HANA Scale-Out Solution Solution Brief December 2014 With Intelligent Intel Xeon Processors Highlights Scale SAP HANA on Demand Scale-out capabilities, combined with high-performance NetApp
More informationVectorwise 3.0 Fast Answers from Hadoop. Technical white paper
Vectorwise 3.0 Fast Answers from Hadoop Technical white paper 1 Contents Executive Overview 2 Introduction 2 Analyzing Big Data 3 Vectorwise and Hadoop Environments 4 Vectorwise Hadoop Connector 4 Performance
More informationThe Inside Scoop on Hadoop
The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationEMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.
EMC Federation Big Data Solutions 1 Introduction to data analytics Federation offering 2 Traditional Analytics! Traditional type of data analysis, sometimes called Business Intelligence! Type of analytics
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationBuilding a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationProact whitepaper on Big Data
Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources
More informationVIEWPOINT. High Performance Analytics. Industry Context and Trends
VIEWPOINT High Performance Analytics Industry Context and Trends In the digital age of social media and connected devices, enterprises have a plethora of data that they can mine, to discover hidden correlations
More informationBig Data Can Drive the Business and IT to Evolve and Adapt
Big Data Can Drive the Business and IT to Evolve and Adapt Ralph Kimball Associates 2013 Ralph Kimball Brussels 2013 Big Data Itself is Being Monetized Executives see the short path from data insights
More informationViswanath Nandigam Sriram Krishnan Chaitan Baru
Viswanath Nandigam Sriram Krishnan Chaitan Baru Traditional Database Implementations for large-scale spatial data Data Partitioning Spatial Extensions Pros and Cons Cloud Computing Introduction Relevance
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationOutline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationBigBench: Towards an Industry Standard Benchmark for Big Data Analytics
BigBench: Towards an Industry Standard Benchmark for Big Data Analytics Ahmad Ghazal 1,5, Tilmann Rabl 2,6, Minqing Hu 1,5, Francois Raab 4,8, Meikel Poess 3,7, Alain Crolotte 1,5, Hans-Arno Jacobsen 2,9
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
More informationShaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council (TPC)
Shaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council (TPC) Nicholas Wakou August 29, 2011 Seattle, WA Authors: Raghunath Othayoth Nambiar
More informationRed Hat Enterprise Linux is open, scalable, and flexible
CHOOSING AN ENTERPRISE PLATFORM FOR BIG DATA Red Hat Enterprise Linux is open, scalable, and flexible TECHNOLOGY OVERVIEW 10 things your operating system should deliver for big data 1) Open source project
More informationBig Data Generation. Tilmann Rabl and Hans-Arno Jacobsen
Big Data Generation Tilmann Rabl and Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto tilmann.rabl@utoronto.ca, jacobsen@eecg.toronto.edu http://msrg.org Abstract. Big data challenges
More informationScientific Computing Data Management Visions
Scientific Computing Data Management Visions ELI-Tango Workshop Szeged, 24-25 February 2015 Péter Szász Group Leader Scientific Computing Group ELI-ALPS Scientific Computing Group Responsibilities Data
More informationComputing. Chaitan Baru San Diego Supercomputer Center. Competitive Advantage Through Cloud Computing
1 Technical Aspects of Cloud Computing Chaitan Baru San Diego Supercomputer Center Competitive Advantage Through Cloud Computing 2 News flash Now Available in the Cloud.. What does this mean? For the developers
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationMapReduce and Hadoop Distributed File System V I J A Y R A O
MapReduce and Hadoop Distributed File System 1 V I J A Y R A O The Context: Big-data Man on the moon with 32KB (1969); my laptop had 2GB RAM (2009) Google collects 270PB data in a month (2007), 20000PB
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationHadoop: Embracing future hardware
Hadoop: Embracing future hardware Suresh Srinivas @suresh_m_s Page 1 About Me Architect & Founder at Hortonworks Long time Apache Hadoop committer and PMC member Designed and developed many key Hadoop
More informationHadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
More informationApplication and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster
Application and Micro-benchmark Performance using MVAPICH2-X on SDSC Gordon Cluster Mahidhar Tatineni (mahidhar@sdsc.edu) MVAPICH User Group Meeting August 27, 2014 NSF grants: OCI #0910847 Gordon: A Data
More informationRemoving Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays. Red Hat Performance Engineering
Removing Performance Bottlenecks in Databases with Red Hat Enterprise Linux and Violin Memory Flash Storage Arrays Red Hat Performance Engineering Version 1.0 August 2013 1801 Varsity Drive Raleigh NC
More informationDevelopment of a Computational and Data-Enabled Science and Engineering Ph.D. Program
Development of a Computational and Data-Enabled Science and Engineering Ph.D. Program Paul T. Bauman, Varun Chandola, Abani Patra Matthew Jones University at Buffalo, State University of New York EduHPC
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationHadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationCost-Effective Business Intelligence with Red Hat and Open Source
Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,
More informationEvaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation
Evaluation Report: HP Blade Server and HP MSA 16GFC Storage Evaluation Evaluation report prepared under contract with HP Executive Summary The computing industry is experiencing an increasing demand for
More informationDiscussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data
Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data Chaitanya Baru 11, Milind Bhandarkar 10, Carlo Curino 7, Manuel Danisch 1, Michael Frank 1, Bhaskar Gowda 6, Hans-Arno
More informationBig Data Defined Introducing DataStack 3.0
Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationStructured data meets unstructured data in Azure and Hadoop
1 Structured data meets unstructured data in Azure and Hadoop Sameer Parve, Blesson John sameerpa@microsoft.com Blessonj@Microsoft.com PFE SQL Server/Analytics Platform System October 30 th 2014 Agenda
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationGraySort and MinuteSort at Yahoo on Hadoop 0.23
GraySort and at Yahoo on Hadoop.23 Thomas Graves Yahoo! May, 213 The Apache Hadoop[1] software library is an open source framework that allows for the distributed processing of large data sets across clusters
More informationAmerica s Most Wanted a metric to detect persistently faulty machines in Hadoop
America s Most Wanted a metric to detect persistently faulty machines in Hadoop Dhruba Borthakur and Andrew Ryan dhruba,andrewr1@facebook.com Presented at IFIP Workshop on Failure Diagnosis, Chicago June
More informationWhite Paper February 2010. IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario
White Paper February 2010 IBM InfoSphere DataStage Performance and Scalability Benchmark Whitepaper Data Warehousing Scenario 2 Contents 5 Overview of InfoSphere DataStage 7 Benchmark Scenario Main Workload
More informationGet More Scalability and Flexibility for Big Data
Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationPlatfora Big Data Analytics
Platfora Big Data Analytics ISV Partner Solution Case Study and Cisco Unified Computing System Platfora, the leading enterprise big data analytics platform built natively on Hadoop and Spark, delivers
More informationHadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers
More informationQLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics
QLogic 16Gb Gen 5 Fibre Channel for Database Assessment for Database and Business Analytics Using the information from databases and business analytics helps business-line managers to understand their
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More information