Welcome to the 6 th Workshop on Big Data Benchmarking TILMANN RABL MIDDLEWARE SYSTEMS RESEARCH GROUP DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF TORONTO BANKMARK
Please note! This workshop may be monitored and recorded for quality assurance. 6/16/2015 (C) TILMANN RABL - 2015 2
Thanks to our Sponsors & Supporters Platinum Sponsor Supported by Gold Sponsor Silver Sponsor Special thanks: Esther, Meredith, and Mimi @ Fields And of course: Our speakers and attendees 6/16/2015 (C) TILMANN RABL - 2015 3
WiFi & Power WiFi SSID: UofT User: WBDB2015 Login: login2015 Power There are power outlets in the first few rows. Don t be shy. 6/16/2015 (C) TILMANN RABL - 2015 4
Some History Genesis of the Big Data Benchmarking effort Grant from NSF under the Cluster Exploratory (CluE) program (Chaitan Baru, SDSC) Chaitan Baru (SDSC), Tilmann Rabl (University of Toronto), Milind Bhandarkar (Pivotal/Greenplum), Raghu Nambiar (Cisco), Meikel Poess (Oracle) Launched Workshops on Big Data Benchmarking First WBDB: May 2012, San Jose. Hosted by Brocade Objectives Lay the ground for development of industry standards for measuring the effectiveness of hardware and software technologies dealing with big data Exploit synergies between benchmarking efforts Offer a forum for presenting and debating platforms, workloads, data sets and metrics relevant to big data 6/16/2015 (C) TILMANN RABL - 2015 5
Further Workshops 2 nd WBDB: http://clds.sdsc.edu/wbdb2012.in 3 rd WBDB: http://clds.sdsc.edu/wbdb2013.cn 4 th WBDB: http://clds.sdsc.edu/wbdb2013.us 5 th WBDB: http://clds.sdsc.edu/wbdb2014.de 6/16/2015 (C) TILMANN RABL - 2015 6
First Outcomes Big Data Benchmark Community (BDBC) Regular conference calls for talks and announcements Open to anyone interested, free of charge BDBC makes no claims to any developments or ideas clds.ucsd.edu/bdbc/community 200 members, 80 organizations 6/16/2015 (C) TILMANN RABL - 2015 7
Further Outcomes Selected papers in Springer Verlag, Lecture Notes in Computer Science, Springer Verlag LNCS 8163: Specifying Big Data Benchmarks (covering the first and second workshops) LNCS 8585: Advancing Big Data Benchmarks (covering the third and fourth workshops) LNCS 8991: Big Data Benchmarking (covering the fifth workshop) Formation of TPC Subcommittee on Big Data Benchmarking Working on TPCx-HS: TPC Express benchmark for Hadoop Systems, based on Terasort http://www.tpc.org/tpcbd/ Proposal of BigData Top100 List Specification & kit of BigBench We will hear more about this 6/16/2015 (C) TILMANN RABL - 2015 8
Also, Formation of a SPEC Research Group on Big Data Benchmarking Objectives: Defining clear goals for the aspects of a Big Data system that are important to measure Developing sound rules and metrics to measure the performance of Big Data systems Fostering collaboration among related benchmarking efforts Initial Committee Structure Tilmann Rabl (Chair) Chaitan Baru (Vice Chair) Meikel Poess (Secretary) John Poelman (Release Manager) To replace less formal BDBC group Will meet tomorrow as part of WBDB 6/16/2015 (C) TILMANN RABL - 2015 9
And now the Future 6/16/2015 (C) TILMANN RABL - 2015 10
Agenda Day 1 Start time Title Speaker 8:30 am Breakfast 9:00 am Opening & Introduction Round Tilmann Rabl 9:45 am Academic Keynote: Waterloo Benchmarks for Graph Data Management Tamer Özsu 10:45 am What is BigBench Michael Frank 11:30 am Tuning and Optimizing an end to end benchmark - Our Experience tuning Big data Benchmark for BigBench Specification Yi Zhou 12:00 pm Lunch 1:00 pm Invited Presentation: PerfKit - Benchmarking the Cloud Ivan Filho 1:45 pm Lessons Learned: Developing a Standard Raghunath Nambiar 2:10 pm BigBench Evolution: Observations and Recommendations John Poelman 2:35 pm Performance Evaluation of Spark SQL using BigBench Todor Ivanov 3:00 pm Coffee break 3:30 pm Introduction to Discussion Tilmann Rabl 3:45 pm Breakout discussion 5:15 pm Consolidation 6:00 pm Dinner 6/16/2015 (C) TILMANN RABL - 2015 11
Agenda Day 2 Start time Title Speaker 8:30 am Breakfast 9:00 am Recap Tilmann Rabl 9:15 am Industrial Keynote: SAP HANA Platform Evolution from In Memory RDBMS to Enterprise Big Data Infrastructure Anil Goel 10:15 am Announcement of the First Big Data Benchmarking Challenge Tilmann Rabl 10:30 am Using BigBench to Evaluate an Automated Physical Design of Materialized Views Jiang Du 10:45 am Set of Metrics to Evaluate HDFS and S3 Performance on Amazon EMR with Avro and Parquet Formats Zeev Lieber 11:10 am Benchmarking the Availability and Fault Tolerance of Cassandra Marten Rosselli 11:35 am Big Data Benchmarking needs Big Metadata Generation Boris Glavic 12:00 pm Lunch 1:00 pm Invited Presentation: BigDataBench: An open-source Big Data Benchmark suite Jianfeng Zhan 1:45 pm From performance profiling to Predictive Analytics while evaluating Hadoop cost-effectiveness using ALOJA Nicolas Poggi 2:10 pm A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Berral 2:35 pm ALOJA-HDI: A characterization of cost-effectiveness of PaaS Hadoop on the Azure Cloud Aaron Call 3:00 pm Coffee break 3:30 pm SPEC Research Group on Big Data Meeting 5:30 pm Closing Remarks and Announcement of WBDB2015.in Tilmann Rabl 6/16/2015 (C) TILMANN RABL - 2015 12
Logistics You are here Lunch is just outside (as is coffee) Additional rooms for discussions One level below (BA025, BA026) Washrooms You are here 6/16/2015 (C) TILMANN RABL - 2015 13
Dinner Hart House Gallery Grill 6:00 pm Reception 7:00 pm Dinner Dinner You are here 6/16/2015 (C) TILMANN RABL - 2015 14
Discussion Session We will break up the group 3 rooms Here, BA025, BA026 Each group will work on the same topic We need a discussion leader and a scribe I have topics Centered around BigBench Feel free to suggest! 6/16/2015 (C) TILMANN RABL - 2015 15
Introduction Round Who am I? What is my institution? How did I end up at this workshop? 6/16/2015 (C) TILMANN RABL - 2015 16
M. Tamer Özsu - University of Waterloo Bio Professor of Computer Science, Dean (Research) of Faculty of Mathematics at U Waterloo Fellow of the ACM, IEEE; member of Science Academy of Turkey, Sigma Xi, American Association for the Advancement of Science (AAAS). Keynote Waterloo Benchmarks for Graph Data Management 6/16/2015 (C) TILMANN RABL - 2015 17
Welcome to the 6 th Workshop on Big Data Benchmarking TILMANN RABL MIDDLEWARE SYSTEMS RESEARCH GROUP DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF TORONTO BANKMARK
Agenda Day 2 Start time Title Speaker 8:30 am Breakfast 9:00 am Recap Tilmann Rabl 9:15 am Industrial Keynote: SAP HANA Platform Evolution from In Memory RDBMS to Enterprise Big Data Infrastructure Anil Goel 10:15 am Announcement of the First Big Data Benchmarking Challenge Tilmann Rabl 10:30 am Using BigBench to Evaluate an Automated Physical Design of Materialized Views Jiang Du 10:45 am Set of Metrics to Evaluate HDFS and S3 Performance on Amazon EMR with Avro and Parquet Formats Zeev Lieber 11:10 am Benchmarking the Availability and Fault Tolerance of Cassandra Marten Rosselli 11:35 am Big Data Benchmarking needs Big Metadata Generation Boris Glavic 12:00 pm Lunch 1:00 pm Invited Presentation: BigDataBench: An open-source Big Data Benchmark suite Jianfeng Zhan 1:45 pm From performance profiling to Predictive Analytics while evaluating Hadoop cost-effectiveness using ALOJA Nicolas Poggi 2:10 pm A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Berral 2:35 pm ALOJA-HDI: A characterization of cost-effectiveness of PaaS Hadoop on the Azure Cloud Aaron Call 3:00 pm Coffee break 3:30 pm SPEC Research Group on Big Data Meeting 5:30 pm Closing Remarks and Announcement of WBDB2015.in Tilmann Rabl 6/16/2015 (C) TILMANN RABL - 2015 19
Anil Goel - SAP Bio Chief Architect at SAP PhD CS (University of Waterloo) Keynote SAP HANA Platform Evolution from In Memory RDBMS to Enterprise Big Data Infrastructure 6/16/2015 (C) TILMANN RABL - 2015 20
Thank You! SEE YOU IN DECEMBER IN NEW DELHI! 6/16/2015 (C) TILMANN RABL - 2015 21