CS 294: Big Data System Research: Trends and Challenges
|
|
- Brooke Parrish
- 8 years ago
- Views:
Transcription
1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi ( 1
2 Big Data First papers:» 2003: The Google file system paper» 2004: The MapReduce paper Today every major system & networking conference has Big Data sessions
3 Big Data Impact Already helped create new business Already helped disrupt existing businesses» Retail» Rental» Taxi» home appliances»
4 Big Data Stack Data Processing Layer Resource Management Layer Storage Layer
5 Hadoop Stack Hive Pig Data Processing ImpalaLayer Storm Hadoop MR Resource Hadoop Management Yarn Layer Storage HDFS, S3, Layer
6 The Berkeley AMPLab lgorithms January » 8 faculty» > 40 students» 3 software engineer team Organized for collaboration achines eople AMPCamp3 (August, 2013) 3 day retreats (twice a year) 220 campers (100+ companies)
7 The Berkeley AMPLab Governmental and industrial funding: Goal: Next generation of open source data analytics stack for industry & academia: Berkeley Data Analytics Stack (BDAS)
8 Spark Streaming BDAS Stack BlinkDB Data Processing Layer Shark SQL Spark Resource Management Mesos Layer Tachyon GraphX Storage Layer HDFS, S3, MLBase MLlib
9 BDAS & Hadoop fitting together Spark Streaming BlinkDB GraphX Shark SQL Spark Hadoop Mesos Yarn Tachyon HDFS, S3, HDFS, S3, MLBase MLlib
10 How do BDAS & Hadoop fit together? Spark Spark Streaming Straming BlinkDB Shark SQL BlinkDB Shark SQL Spark Graph X MLbase ML library Spark GraphX Hive Pig Hadoop MR MLBase MLlib Impala Storm Hadoop Mesos Yarn Tachyon HDFS, S3, HDFS, S3,
11 How do BDAS & Hadoop fit together? Spark Straming BlinkDB Shark SQL Graph X MLbase ML library Hive Pig Impala Storm Spark Hadoop MR Hadoop Mesos Yarn Tachyon HDFS, S3, HDFS, S3,
12 This Class Learn about state-of-art research in Big Data Work on an exciting project Hopefully start next generation of impactful projects
13 Grading Project: 60% Class presentations: 40%» Around 2 papers per student» See Randy s guidelines for leading discussion on papers CS294.F07/LeadingPapers.pdf 13
14 Administrative Information Class website: Office Hours (Soda 465D):» TBA Create an (anonymized) blog account for paper reviews if you don t have one yet (e.g., Sent me an by Monday, August 31, with your blog url» Preferred for the class list 14
15 Is the problem real? Papers What is the solution s main idea (nugget)? Why is solution different from previous work?» Are system assumptions different?» Is workload different?» Is problem new? Does the paper (or do you) identify any fundamental/hard trade-offs? 15
16 Papers (cont d) Do you think the work will be influential in 10 years?» Why or why not? Predicting the future hard, but worth a try» Look at past examples for inspiration 16
17 Streaming Over TCP Countless papers:» Why cannot be done» New protocols to do it Today» Virtually all streaming over TCP» Trend to stream over HTTP! 17
18 Why did it Succeed? 18
19 Multicast Countless papers:» Why world will come to a standstill without multicast» New protocols to do it Today» Multicast is used only in enterprise settings at best» Overlay multicast widely used in the Internet CDN based, e.g., WorldCup, March Madness, Iinagurations,... P2P, mostly popular outside US (e.g., China) 19
20 Why Did it Fail? 20
21 Shared Memory Countless papers:» How shared memory simplifies programming parallel computers» Many, many systems proposed and build Today:» Message passing (MPI) took over as the de facto standard for writing parallel applications 21
22 Why Did it Fail? 22
23 Network Computer Big in 90s» Promoted by an alliance of Sun, Oracle, Acorn Promise: many of advantages of cloud computing» Easy to manage» Application sharing» Failed miserably 23
24 Why Did it Fail? 24
25 Coming Back: ChromeOS Will it succeed this time? 25
26 What are Hard/Fundamental Tradeoffs? Brewer s CAP conjecture: Consistency, Availability, Partition-tolerance, you can have only two in a distributed system In a in-order, reliable communication protocol cannot minimize overhead and latency simultaneously Hard to simultaneously maximize evolvability and performance 26
Conquering Big Data with BDAS (Berkeley Data Analytics)
UC BERKELEY Conquering Big Data with BDAS (Berkeley Data Analytics) Ion Stoica UC Berkeley / Databricks / Conviva Extracting Value from Big Data Insights, diagnosis, e.g.,» Why is user engagement dropping?»
More informationBerkeley Data Analytics Stack:! Experience and Lesson Learned
UC BERKELEY Berkeley Data Analytics Stack:! Experience and Lesson Learned Ion Stoica UC Berkeley, Databricks, Conviva Research Philosophy Follow real problems Focus on novel usage scenarios Build real
More informationThe Berkeley AMPLab - Collaborative Big Data Research
The Berkeley AMPLab - Collaborative Big Data Research UC BERKELEY Anthony D. Joseph LASER Summer School September 2013 About Me Education: MIT SB, MS, PhD Joined Univ. of California, Berkeley in 1998 Current
More informationMaking Sense at Scale with Algorithms, Machines & People!
UC BERKELEY Making Sense at Scale with Algorithms, Machines & People PI: Michael Franklin University of California, Berkeley Expeditions in Computing PI Meeting May 15, 2013 2 The Berkeley AMPLab Sources
More informationBig Data Research in the AMPLab: BDAS and Beyond
Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned
More informationWhat s next for the Berkeley Data Analytics Stack?
What s next for the Berkeley Data Analytics Stack? Michael Franklin June 30th 2014 Spark Summit San Francisco UC BERKELEY AMPLab: Collaborative Big Data Research 60+ Students, Postdocs, Faculty and Staff
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationNext-Gen Big Data Analytics using the Spark stack
Next-Gen Big Data Analytics using the Spark stack Jason Dai Chief Architect of Big Data Technologies Software and Services Group, Intel Agenda Overview Apache Spark stack Next-gen big data analytics Our
More informationBeyond Hadoop with Apache Spark and BDAS
Beyond Hadoop with Apache Spark and BDAS Khanderao Kand Principal Technologist, Guavus 12 April GITPRO World 2014 Palo Alto, CA Credit: Some stajsjcs and content came from presentajons from publicly shared
More informationSystems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de
Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft
More informationConquering Big Data with Apache Spark
Conquering Big Data with Apache Spark Ion Stoica November 1 st, 2015 UC BERKELEY The Berkeley AMPLab January 2011 2017 8 faculty > 50 students 3 software engineer team Organized for collaboration achines
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationThe Berkeley Data Analytics Stack: Present and Future
The Berkeley Data Analytics Stack: Present and Future Michael Franklin 27 March 2014 Technion Big Day on Big Data UC BERKELEY BDAS in the Big Data Context 2 Sources Driving Big Data It s All Happening
More informationCase Study : 3 different hadoop cluster deployments
Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationWhy Spark on Hadoop Matters
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
More informationUnified Big Data Analytics Pipeline. 连 城 lian@databricks.com
Unified Big Data Analytics Pipeline 连 城 lian@databricks.com What is A fast and general engine for large-scale data processing An open source implementation of Resilient Distributed Datasets (RDD) Has an
More informationHow To Create A Data Visualization With Apache Spark And Zeppelin 2.5.3.5
Big Data Visualization using Apache Spark and Zeppelin Prajod Vettiyattil, Software Architect, Wipro Agenda Big Data and Ecosystem tools Apache Spark Apache Zeppelin Data Visualization Combining Spark
More informationArchiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage
Archiving and Sharing Big Data Digital Repositories, Libraries, Cloud Storage Cyrus Shahabi, Ph.D. Professor of Computer Science & Electrical Engineering Director, Integrated Media Systems Center (IMSC)
More informationTachyon: memory-speed data sharing
Tachyon: memory-speed data sharing Ali Ghodsi, Haoyuan (HY) Li, Matei Zaharia, Scott Shenker, Ion Stoica UC Berkeley Memory trumps everything else RAM throughput increasing exponentially Disk throughput
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationPilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing
Pilot-Streaming: Design Considerations for a Stream Processing Framework for High- Performance Computing Andre Luckow, Peter M. Kasson, Shantenu Jha STREAMING 2016, 03/23/2016 RADICAL, Rutgers, http://radical.rutgers.edu
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationLearning. Spark LIGHTNING-FAST DATA ANALYTICS. Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia
Compliments of Learning Spark LIGHTNING-FAST DATA ANALYTICS Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia Bring Your Big Data to Life Big Data Integration and Analytics Learn how to power
More informationAn Open Source Memory-Centric Distributed Storage System
An Open Source Memory-Centric Distributed Storage System Haoyuan Li, Tachyon Nexus haoyuan@tachyonnexus.com September 30, 2015 @ Strata and Hadoop World NYC 2015 Outline Open Source Introduction to Tachyon
More informationHadoop in the Enterprise
Hadoop in the Enterprise Modern Architecture with Hadoop 2 Jeff Markham Technical Director, APAC Hortonworks Hadoop Wave ONE: Web-scale Batch Apps relative % customers 2006 to 2012 Web-Scale Batch Applications
More informationCS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [SPARK] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Streaming Significance of minimum delays? Interleaving
More informationTachyon: A Reliable Memory Centric Storage for Big Data Analytics
Tachyon: A Reliable Memory Centric Storage for Big Data Analytics a Haoyuan (HY) Li, Ali Ghodsi, Matei Zaharia, Scott Shenker, Ion Stoica June 30 th, 2014 Spark Summit @ San Francisco UC Berkeley Outline
More informationSecond Credit Seminar Presentation on Big Data Analytics Platforms: A Survey
Second Credit Seminar Presentation on Big Data Analytics Platforms: A Survey By, Mr. Brijesh B. Mehta Admission No.: D14CO002 Supervised By, Dr. Udai Pratap Rao Computer Engineering Department S. V. National
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationHadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter : @carbone
Hadoop2, Spark Big Data, real time, machine learning & use cases Cédric Carbone Twitter : @carbone Agenda Map Reduce Hadoop v1 limits Hadoop v2 and YARN Apache Spark Streaming : Spark vs Storm Machine
More informationHadoop Big Data for Processing Data and Performing Workload
Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer
More informationSurvey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf
Survey of the Benchmark Systems and Testing Frameworks For Tachyon-Perf Rong Gu,Qianhao Dong 2014/09/05 0. Introduction As we want to have a performance framework for Tachyon, we need to consider two aspects
More informationAnalytics on Spark & Shark @Yahoo
Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationShark Installation Guide Week 3 Report. Ankush Arora
Shark Installation Guide Week 3 Report Ankush Arora Last Updated: May 31,2014 CONTENTS Contents 1 Introduction 1 1.1 Shark..................................... 1 1.2 Apache Spark.................................
More informationInformation Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
More information6.S897 Large-Scale Systems
6.S897 Large-Scale Systems Instructor: Matei Zaharia" Fall 2015, TR 2:30-4, 34-301 bit.ly/6-s897 Outline What this course is about" " Logistics" " Datacenter environment What this Course is About Large-scale
More informationA Reliable Memory-Centric Distributed Storage System a
A Reliable Memory-Centric Distributed Storage System a Haoyuan Li October 16 @ Strata & Hadoop World NYC Website: tachyon-project.org Meetup: www.meetup.com/tachyon UC Berkeley Outline Overview Feature
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationBig Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015
More informationFast and Expressive Big Data Analytics with Python. Matei Zaharia UC BERKELEY
Fast and Expressive Big Data Analytics with Python Matei Zaharia UC Berkeley / MIT UC BERKELEY spark-project.org What is Spark? Fast and expressive cluster computing system interoperable with Apache Hadoop
More informationSpark: Making Big Data Interactive & Real-Time
Spark: Making Big Data Interactive & Real-Time Matei Zaharia UC Berkeley / MIT www.spark-project.org What is Spark? Fast and expressive cluster computing system compatible with Apache Hadoop Improves efficiency
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationArchitectures for massive data management
Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache
More informationReal-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai jason.dai@intel.com Intel Software and Services Group
Real-Time Analytical Processing (RTAP) Using the Spark Stack Jason Dai jason.dai@intel.com Intel Software and Services Group Project Overview Research & open source projects initiated by AMPLab in UC Berkeley
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Apache Spark Apache Spark 1
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationCIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Cloud Computing and Amazon Web Services Cloud Computing Amazon
More informationxpaaerns on Spark, Shark, Tachyon and Mesos
xpaaerns on Spark, Shark, Tachyon and Mesos Spark Summit 2014 Claudiu Barbura Sr. Director of Engineering A>geo Agenda xpa&erns Architecture From Hadoop to BDAS & our contribu
More informationEnabling High performance Big Data platform with RDMA
Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery
More informationBig Data Analytics Hadoop and Spark
Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationBig Data Frameworks Course. Prof. Sasu Tarkoma 10.3.2015
Big Data Frameworks Course Prof. Sasu Tarkoma 10.3.2015 Contents Course Overview Lectures Assignments/Exercises Course Overview This course examines current and emerging Big Data frameworks with focus
More informationCSE-E5430 Scalable Cloud Computing Lecture 11
CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 30.11-2015 1/24 Distributed Coordination Systems Consensus
More informationStratioDeep. An integration layer between Cassandra and Spark. Álvaro Agea Herradón Antonio Alcocer Falcón
StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer Falcón StratioDeep An integration layer between Cassandra and Spark Álvaro Agea Herradón Antonio Alcocer
More informationMesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II)
UC BERKELEY Mesos: A Platform for Fine- Grained Resource Sharing in Data Centers (II) Anthony D. Joseph LASER Summer School September 2013 My Talks at LASER 2013 1. AMP Lab introduction 2. The Datacenter
More informationA survey on platforms for big data analytics
Singh and Reddy Journal of Big Data 2014, 1:8 SURVEY PAPER Open Access A survey on platforms for big data analytics Dilpreet Singh and Chandan K Reddy * * Correspondence: reddy@cs.wayne.edu Department
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationCray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler
Cray XC30 Hadoop Platform Jonathan (Bill) Sparks Howard Pritchard Martha Dumler Safe Harbor Statement This presentation may contain forward-looking statements that are based on our current expectations.
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationDANIEL EKLUND UNDERSTANDING BIG DATA AND THE HADOOP TECHNOLOGIES NOVEMBER 2-3, 2015 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY)
LA TECHNOLOGY TRANSFER PRESENTS PRESENTA DANIEL EKLUND UNDERSTANDING BIG DATA AND THE HADOOP TECHNOLOGIES NOVEMBER 2-3, 2015 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY) info@technologytransfer.it
More informationSAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES
SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES AWS GLOBAL INFRASTRUCTURE 10 Regions 25 Availability Zones 51 Edge locations WHAT
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationProcessing NGS Data with Hadoop-BAM and SeqPig
Processing NGS Data with Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
More informationwww.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage
www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization
More informationMachine- Learning Summer School - 2015
Machine- Learning Summer School - 2015 Big Data Programming David Franke Vast.com hbp://www.cs.utexas.edu/~dfranke/ Goals for Today Issues to address when you have big data Understand two popular big data
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationHPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox gcf@indiana.edu http://www.infomall.org
More informationBig Data. Lyle Ungar, University of Pennsylvania
Big Data Big data will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus. McKinsey Data Scientist: The Sexiest Job of the 21st Century -
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationHiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group
HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationMaking Sense at Scale with the Berkeley Data Analytics Stack
Making Sense at Scale with the Berkeley Data Analytics Stack Michael Franklin February 3, 2015 WSDM 2015 Shanghai UC BERKELEY Agenda A Little Bit on Big Data AMPLab Overview BDAS Philosophy: Unification
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationEducational Opportunities in Big Data
Educational Opportunities in Big Data Could current Big Gaps in Talent fill the void and Big Market Demand? Dr. KRS Murthy Dr.Sri.Murthy@Gmail.Com BigDataExpert@Gmail.Com (408)-464-3333 Big Gaps in Big
More informationWHITE PAPER. Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets INTELLIGENT BUSINESS STRATEGIES
INTELLIGENT BUSINESS STRATEGIES WHITE PAPER Building Big Data Analytical Applications at Scale Using Existing ETL Skillsets By Mike Ferguson Intelligent Business Strategies June 2015 Prepared for: Table
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationCloudera Enterprise Reference Architecture for Google Cloud Platform Deployments
Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and
More informationBig Data Processing. Patrick Wendell Databricks
Big Data Processing Patrick Wendell Databricks About me Committer and PMC member of Apache Spark Former PhD student at Berkeley Left Berkeley to help found Databricks Now managing open source work at Databricks
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationSpark Application on AWS EC2 Cluster:
Spark Application on AWS EC2 Cluster: ImageNet dataset classification using Multinomial Naive Bayes Classifier Chao-Hsuan Shen, Patrick Loomis, John Geevarghese 1 Acknowledgements Special thanks to professor
More informationBig Data Realities Hadoop in the Enterprise Architecture
Big Data Realities Hadoop in the Enterprise Architecture Paul Phillips Director, EMEA, Hortonworks pphillips@hortonworks.com +44 (0)777 444 3857 Hortonworks Inc. 2012 Page 1 Agenda The Growth of Enterprise
More informationWWW.WIPRO.COM HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS
WWW.WIPRO.COM HADOOP VENDOR DISTRIBUTIONS THE WHY, THE WHO AND THE HOW? Guruprasad K.N. Enterprise Architect Wipro BOTWORKS Table of contents 01 Abstract 01 02 03 04 The Why - Need for The Who - Prominent
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationThere's Plenty of Room in the Cloud
There's Plenty of Room in the Cloud [Shameless reference to Feynman s talk from 1959] Lecturer: Zoran Dimitrijevic Altiscale, Inc. Spring 2015 CS290B -- Cloud Computing 50 Years of Moore
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationDistributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us
DATA INTELLIGENCE FOR ALL Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us Christopher Nguyen, PhD Co-Founder & CEO Agenda 1. Challenges & Motivation 2. DDF Overview 3. DDF Design
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationSelf-service BI for big data applications using Apache Drill
Self-service BI for big data applications using Apache Drill 2015 MapR Technologies 2015 MapR Technologies 1 Data Is Doubling Every Two Years Unstructured data will account for more than 80% of the data
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More information