Setting the Direction for Big Data Benchmark Standards 1

Size: px
Start display at page:

Download "Setting the Direction for Big Data Benchmark Standards 1"

Transcription

1 Setting the Direction for Big Data Benchmark Standards 1 Chaitan Baru 1, Milind Bhandarkar 2, Raghunath Nambiar 3, Meikel Poess 4, Tilmann Rabl 5 1 San Diego Supercomputer Center, UC San Diego, USA [email protected] 2 Greenplum, EMC, USA [email protected] 3 Cisco Systems, Inc, USA [email protected] 4 Oracle Corporation, USA [email protected] 5 University of Toronto [email protected] Abstract. We provide a summary of the outcomes from the Workshop on Big Data Benchmarking (WBDB2012) held on May 8-9, 2012 in San Jose, CA. The workshop discussed a number of issues related to big data benchmarking definitions and benchmark processes. The workshop was attended by 60 invitees representing 45 different organizations covering industry and academia. Attendees were chosen based on their experience and expertise in one or more areas of big data, database systems, performance benchmarking, and big data applications. The workshop participants concluded that there was both a need and an opportunity for defining benchmarks to capture the end-to-end aspects of big data applications. Benchmark metrics would need to include metrics for performance as well as price/performance. The benchmark would need to consider several costs, including total system cost, setup cost, and energy costs. The next Big Data Benchmarking workshop is currently being planned for December 17-18, 2012 in Pune, India. Keywords: Industry Standard Benchmarks, Big Data, 1 Introduction The world has been in the midst of an extraordinary information explosion over the past decade, punctuated by rapid growth in the use of the Internet and the number of connected devices worldwide. Today, we re seeing a rate of change faster than at any point throughout history, and both enterprise application data and machine generated data, known as Big Data, continue to grow exponentially, challenging industry experts and researchers to develop new innovative techniques to evaluate and bench- 1 The WBDB2012 workshop was funded by a grant from the National Science Foundation (Grant# IIS ) and sponsorship from Brocade, Greenplum, Mellanox, and Seagate.

2 mark hardware and software technologies and products. As illustrated in Fig. 1, studies have estimated that the total amount of enterprise data will grow from about 0.5 zettabyte in 2008 to 35 zettabytes in Fig. 1: Amount of enterprise data growth from 2008 to 2010 As indicated in Fig. 2, the phenomenon is global in nature, with Asia rapidly emerging as a major source of data users as well as data generators. With increasing penetration of data-driven computing, Web and mobile technologies, and enterprise computing, the emerging markets have the potential for further adding to the already rapid growth in data. Fig. 2: Internet users in the world distributed by world regions Confronting this data deluge and tackling big data applications requires development of benchmarks to help quantify system architectures that support such applications. The First Workshop on Big Data Benchmarking (WBDB) [15] is a first important step towards the development of benchmarks for providing objective measures of the effectiveness of hardware and software systems dealing with big data applications. These benchmarks would facilitate evaluation of alternative solutions and provide for comparisons among different solution approaches. The benchmarks need to characterize the new feature sets, enormous data sizes, large-scale and evolving system configurations, shifting loads, and heterogeneous technologies of big-data and cloud platforms. 2

3 The objective of the First Workshop on Big Data Benchmarking (WBDB2012) [15] workshop was to identify key issues and to launch an activity around the definition of reference benchmarks that can capture the essence of big data application scenarios. Attendees included academic and industry researchers and practitioners with backgrounds in big data, database systems, benchmarking and system performance, cloud storage and computing, and related areas. Each attendee was required to submit a two page abstract on their vision towards Big Data Benchmarking. The program committee reviewed the abstracts and classified them in to four categories namely: data generation, benchmark properties, benchmark process and hardware and software aspects for big data workloads. 1.1 Workshop description The workshop was held on May 8-9, 2012 in San Jose, CA, hosted by Brocade at their Executive Briefing Center. There were a total of about 60 attendees representing 45 different organizations (see table in the Appendix for the list of attendees and their affiliations). The workshop structure was similar on both days of the meeting, with presentations in the morning sessions followed by discussions in the afternoon. Each day started with three 15-minute talks. On the first day, these talks provided background on industry benchmarking efforts and standards and discussed desirable attributes and properties of competitive industry benchmarks. The 15-minute talks on the second day focused on big data applications and the different genres of big data, such as genomic and geospatial data. On both days, these opening presentations were followed by 20 lightning talks of 5 minutes each by the invited attendees. The lightning talk format proved to be extremely effective for putting forward and sharing key technical ideas among a group of experts. The workshop group of about 60 attendees was then arbitrarily divided into two equal groups for the afternoon discussion session. Both groups were provided the same topic outline for discussion and ideas from both groups were collated at the end of the day. The workshop program committee (also listed in the appendix) was responsible for drawing up the list of invitees; issuing invitations; designing the workshop structure described above; grouping white papers into lightning talk sessions; and developing the outline to guide the afternoon discussions. The rest of this paper is organized as follows. Section 2 provides a background discussion on the nature of big data, big data applications, and some benchmark related issues; Section 3 discusses guiding principles for designing big data benchmarks; Section 4 discusses the objectives of big data benchmarking, whether such benchmarks should be targeted specifically to encourage technological innovation or primarily for vendor competition; Section 5 goes into some of the details related to big data benchmarks; Section 6 provides conclusions from the workshop discussions and also points to next steps in the process. The paper also includes an appendix with the list of attendees and the workshop program and organizing committee. 3

4 2 Background There was general agreement at the workshop that a big data benchmarking activity should begin at the end application level, by attempting to characterize the end-to-end needs and requirements of big data applications. While isolating individual steps of such an application such as, say, sorting, is also of interest, this should still be done in the context of the broader application scenarios. Furthermore, a range of data genres that need to be considered for big data including, for example, structured, semistructured, and unstructured data; graphs; streams; genomic data; array-based data; and geospatial data. It is important to model the core set of operations relevant to each genre while also looking for similarities across genres. Interestingly, it may be possible to identify relevant application scenarios that involve a variety of data genres and require a range of big data processing capabilities. An example discussed at the workshop was that of data management at an Internet-scale business, say, for an enterprise like Facebook or Netflix. One could construct a plausible use case for such applications that requires big data capabilities for managing data streams (click streams), weblogs, text sorting and indexing, graph construction and traversals, as well as some geospatial data processing and structured data processing. It is also possible that a single application scenario may not plausibly capture all the key aspects, for genres as well as operations on data that may be broadly relevant to big data applications. Thus, there may be a need for multiple benchmark definitions, based on differing scenarios, which together would capture the full range of possibilities. There are good examples of successful benchmarks by benchmark consortia, such as the Transaction Processing Performance Council (TPC) [12] and Standard Performance Evaluation Corporation (SPEC) [9], as benchmarks from industry leaders, such as VMMark, and Top500, as well as benchmarks like Terasort and Graph500 for specific operations and data genres (social graphs), respectively. It is fair to ask whether a new big data benchmark could simply be built upon existing benchmark efforts, by extending the current benchmark definitions appropriately. While this may be possible, a number of issues need to be considered including, whether: the existing benchmark models relevant application scenarios; the existing benchmark can be naturally and easily scaled to the large data volumes that will be required for big data benchmarking; such benchmarks can be used more or less as is, without requiring significant re-engineering to produce data and queries (operations) with the right set of characteristics for big data applications; and the benchmark has inherent restrictions or limitations, such as, say, a requirement to implement all queries to the system in SQL. Several extant benchmarking efforts were mentioned and discussed such as, the Statistical Workload Injector for MapReduce SWIM, developed at the University of California, Berkeley [10]; GridMix3, developed at Yahoo! [1]; YCSB++, developed 4

5 at the Carnegie Mellon University on top of YCSB of Yahoo! [11]; and TPC-DS, the latest benchmark addition to the TPC s suite of decision support benchmarks [7][5][3]. 3 Design Principles for Big Data Benchmarks As mentioned, some benchmarks have gained acceptance and are commonly used in industry including the ones from TPC (e.g., TPC-C [14], TPC-H [4]), SPEC, and Top500. Some of these benchmarks have demonstrated longevity TPC-C is almost 25 years old and the Top500 list is just celebrating 20 years. The workshop identified desirable features of benchmarks, some of which apply to these benchmarks and may have well contributed to their longevity. In the case of the Top500, the metric is simple to understand: the result is a simple rank ordering according to a single performance number (FLOPS). TPC-C, an On Line Transaction Processing (OLTP) workload, possesses characteristics of all TPC benchmarks: it models an application domain; employs strict rules for disclosure and publication of results; uses third-party auditing of results; and publishes performance as well as price/performance metrics. A key desirable aspect of TPC-C is the self-scaling nature of the benchmark, which may be essential for big data benchmarks. TPC-C requires that as the performance number increases (viz. the transactions/minute) the size of the database (i.e. the number of warehouses in the reference database) must also increase. TPC-C requires a new warehouse to be introduced for every 12.5 tpmc. In this sense, TPC-C is a smoothly scaling benchmark with respect to database size. Other benchmarks such as, say, TPC-H, specify fixed, discrete scale factors at which the benchmark runs are measured. The advantage of that approach is that there would be multiple, directly comparable results at a given scale factor. A desirable feature of the big data benchmark should be comparability of results across scale factors, for at least neighboring values. It would also be beneficial if results from a given scale factor could be used to extrapolate performance to the next scale factor (up or down). While there is an argument to be made for simplicity of benchmark results, i.e. publishing only one or very few numbers as the overall result of the benchmark, the desirability of extrapolating results from one scale factor to the next may also mean that the benchmark run must produce an adequate number of reported metrics that would allow such extrapolations to be done. One of the definitions of big data is that it is data at a scale and velocity at which traditional methods of data management and data processing are inadequate. Given the technology-centric nature of this definition, and the fact that there are increasingly many different approaches for large-scale data management, the big data benchmark should be careful in being technology agnostic, as much as possible. Thus, the benchmark itself cannot presume the use of any specific technology, say, the use of a relational database and/or SQL as the query language. The applications, reference data, and workload have to be specified at a level of abstraction (e.g. English) that does not pre-suppose a particular technological approach. 5

6 Finally, there was discussion at the workshop about the cost of running benchmarks, especially big data benchmarks which, by definition, may require large installations and significant amount of preparation. The discussions converged very quickly that it would not be possible to extrapolate performance of large systems running large data sets from performance numbers obtained on small systems running small data sets. To be successful, the benchmark should be simple to run. This includes the easy ability to generate the large benchmark datasets, robust scripts for running the tests, perhaps, available implementations of the benchmark in alternative technologies, e.g. RDBMS and Hadoop and an easy way to verify the correctness of benchmarks results. 4 Benchmark for innovation versus competition The nature of the benchmarking activity is influenced heavily by the ultimate objective, which can be characterized as technical versus marketing (or competitive). The goals of a technical benchmarking activity are primarily to test different technological alternatives to solving a given problem. Such benchmarks would focus much more on collecting detailed technical information and using that, for example, to optimize system implementations. Whereas, a competitive benchmark is focused on comparing performance as well as price/performance among competing products and may require an auditing process as part of the benchmark to ensure a fair competition. The workshop discussion identified that, indeed, both are not mutually exclusive activities. The conclusion was that benchmarks need to be designed initially for competitive purposes to compare among alternative solutions. However, once such benchmarks become successful (such as, say, the TPC benchmarks), there will be an impetus within organizations to use those same benchmarks for innovation as well. Vendors will be interested in developing features in their products that can perform well on such competitive benchmarks. There are numerous examples in the arena of database software where product features and improvement were motivated at least in some part by the desire to perform well in a given benchmark competition. Since a well-designed benchmark suite would reflect real-world needs, this would mean that the product improvements really end up serving the needs of real applications. In sum, a big data benchmark would be utilized for both purposes: competition as well as innovation, though the benchmark should establish itself initially as being relevant as a competitive benchmark. The primary audience for big data benchmarks at this stage is the end customers who need guidance in their decisions about what types of systems to acquire to serve their big data needs. Acceptance of a big data benchmark for that purpose would then automatically lead to the use of the benchmark internally by vendors for innovation as well. Also, while industry participants are more interested in competitive benchmarking, participants from academia are more interested in benchmarking for technical innovation. Also, a benchmark whose primary audience is the end customers must be specified in lay terms, i.e. in a manner that allows even non-technical audiences to grasp the essence of the benchmark and relate it to their real-world application scenarios. At 6

7 one end, the benchmark workload could be expressed in English, while at the other it could be completely encoded in a computer program (written in, say, Java or C++). Given the primary target audience, the preference is for the former (English) rather than the latter. The benchmark must appeal to a broad audience and thus must be expressible in English, and not require knowledge of a programming language just in order to understand the benchmark specification. Using English would provide the most flexibility and broadest audience, though some parts of the specification may still employ a declarative language like SQL. 5 Benchmark Design In this section, we summarize the outcome of the discussion sessions on the benchmark design. 5.1 Component vs. End-to-End benchmark One can identify two types of benchmarks, ones that measure performance of individual components whether components of an application or components in a system and others that measure end-to-end performance or full system performance. We refer to the first type as Component Benchmarks and the second type as End-to-End Benchmarks. The term system in this context may refer to a physical item, e.g. a server or a software product, e.g. a database management system. Component benchmarks measure the performance of one (or a few) components of an entire system with respect to a certain workload. They tend to be relatively easier to specify and run, given the focused/limited scope. For components that expose standardized interfaces (APIs), the benchmarks can be specified in a standardized way and could be run as-is, i.e. using a benchmark kit. An example for a component benchmark is SPEC s CPU benchmark (latest version is CPU2006 [8], which is a CPU-intensive benchmark suite that exercises a system's processor, memory subsystem and compiler. Another example of a component benchmark is TeraSort [16], since sorting is a common component operation in many end-to-end application scenarios. Several workshop attendees observed that TeraSort is a very useful benchmark and has served an extremely useful purpose in helping to exercise and tune large-scale system. The strength of the TeraSort benchmark is its simplicity. End-to-end benchmarks measure the performance of entire systems but can be more difficult to specify and to provide a benchmark kit that could be run as-is due to various systems dependencies and the intrinsic complexity of the benchmark itself. The TPC benchmarks in general are good examples of such End-to-End benchmarks including for OLTP (TPC-C and TPC-E) and Decision Support (TPC-H and TPC- DS). TPC-DS for instance, measures a system s ability to load a database and serve a variety of requests including ad hoc queries, report generation, OLAP and data mining queries, in the presence of continuous data integration activity on a system that includes servers, IO-subsystems and staging areas for data integration. 7

8 5.2 Big Data Applications Big data issues impinge upon a wide range of application domains, covering the range from scientific to commercial applications. Applications that generate very large amounts of data include scientific applications such as the Large Hadron Collider (LHC) and the Sloan Digital Sky Survey (SDSS) as well as social websites such as Facebook, Twitter, and Linkedin, which are the often-quoted examples of big data. However, the more traditional areas such as retail business, e.g. Amazon, Ebay, Walmart, have also reached a situation where they need to deal with big data. Thus, it may be difficult to find a single application that covers all extant flavors of big data processing. There is also the question of whether a big data benchmark should actually attempt to model a concrete application or whether an abstracted model, based on real applications hence a generic benchmark would be more desirable. The benefit of a concrete application is that one could use real world examples as a blueprint for modeling the benchmark data and workload. This makes a detailed specification possible, which helps the reader of understand the benchmark and its result. It also helps in relating real world business and their workloads to the benchmark. Possible applications for big data benchmarks could be a retailer model, such as used in TPC-DS and many others, which has the additional advantage that it is very well understood, researched and provides for many real world scenarios, especially in the area of semantic web data analysis. Another model application is a social website. Large social websites and related services such as Facebook, Twitter, Netflix, and others deal with all different kinds of big data and transformations. To find a suitable model the key characteristics of big data applications have to be considered. Big data applications typically operate on data in several stages using a data processing pipeline. An abstract example of such a pipeline is shown in Fig 3. Fig. 3. Big Data Analytic Pipeline 2 2 Source Agrawal et al.,challenges and Opportunities with Big Data, white paper. 8

9 5.3 Data Sources A key issue to consider is the source for benchmark data. Should the benchmark be based on real data taken from an actual, real-world application, or synthetic data? The conclusion was to rely on synthetic data generated by parallel data generator programs (in order to efficiently generate very large datasets), while ensuring that the data have some real-world characteristics embedded in them. Use of reference datasets is not practical, since that would mean downloading and storing extremely large reference datasets from some location. Furthermore, real datasets would reflect only certain properties in the data and not reflect others. But foremost, it is very difficult to scale reference data sets (see also next section). A synthetic dataset can be designed to incorporate the necessary characteristics in the data. The different genres of big data will require corresponding data generators. 5.4 Scaling An important issue will be the ability to scale up or scale down results based on results obtained with the system under test (SUT). Assuming a big data benchmark allows for a scalable data set and system size, the very nature of big data may well lead to an arms race, i.e. in order to top a competitor s result a vendor may double the hardware, which will in turn provoke the competitor to follow. Such a scenario is not desired as it may hide hardware and software performance improvements over time and force smaller competitor out of the game because of the huge financial burden large configuration impose on the test sponsor. Arguably, one of the characteristics of TPC benchmarking is that the SUT tends to be over-specified in terms of the hardware configuration employed. Vendors (aka benchmark sponsors) are willing to take an additional hit on total system cost in order to achieve better performance numbers without too much of a negative impact on the price/performance numbers. For example, the SUT may employ, say, 10x the amount of disk for a given benchmark database size, whereas real customer installations may only employ 3-4x the amount of disk. However, there is a possibility that with big data benchmarking, the SUT may actually be smaller in size than the actual customer installation, given the expense of pulling together a system for big data applications. Thus, there might be a need to be able to extrapolate (scale up) system performance based on the benchmark results. The set of measurements taken during benchmarking should therefore be adequate to enable such scaling. Another scaling aspect that needs to be considered is the notion of elasticity in cloud computing. How well does the system perform when the data size is dynamically increased as well as when more resources (processors, disk) are added to the system? And, how well does the system perform when some resource are removed as a consequence of a failure, e.g. node or disk failures? TPC benchmarks require a series of Atomicity Consistency Isolation and Durability (ACID) test to be performed with the SUT. For big data benchmarking, the notion would be to run such ACID test, specific 9

10 failure tests more suited to big data systems or a combination of both depending on the type of systems being targeted for the benchmark. 5.5 Metrics Big data benchmark metrics should include performance metrics as well as cost-based metrics (price/performance). The TPC is the forerunner for setting the rules to specify prices of benchmark configuration. Over the years the TPC has learned the hard way how difficult it is to specify the way hardware and software is priced for benchmark configurations (see [2] for a detailed discussion on this topic). The TPC finalized on a canonical way to measure price of benchmark configurations and defined a pricing specification that all TPC benchmark are required to adhere. The TPC model can be adopted for pricing. Other costs that need to be paid attention to are energy costs and systems setup cost, since big data configurations may be very large in scale and setup may be a significant factor in the overall cost. 6 Conclusions and Next Steps The Workshop on Big Data Benchmarking held on May 8-9, 2012 in San Jose, CA discussed a number of issues related to big data benchmarking definitions and processes. The workshop concluded that there was both a need and an opportunity to define benchmarks for big data applications. The benchmark would model end-to-end application scenarios and consider a variety of costs, including total system cost, setup cost, and energy costs. Benchmark metrics would include performance metrics as well as cost metrics. Several next steps are underway. An informal big data benchmarking community has been formed. Biweekly phone conferences are being used to keep this group engaged and to share information among members. A few members of this community are beginning to work on simple examples of end-to-end benchmarks. Efforts are underway to obtain funding to support pilot benchmarking activity. The next Big Data Benchmarking workshop is already under planning, to be held on December 17-18, 2012 in Pune, India, hosted by Persistent Systems. A third workshop is being planned for July 2013 in China,. 7 Bibliography 1. Gridmix3: git://git.apache.org/hadoop-mapreduce.git/src/contrib/gridmix/. 2. Karl Huppler: Price and the TPC. TPCTC 2010: Meikel Pöss, Bryan Smith, Lubor Kollár, Per-Åke Larson: TPC-DS, taking decision support benchmarking to the next level. SIGMOD Conference 2002: Meikel Pöss, Chris Floyd: New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Record 29(4): (2000) 5. Meikel Pöss, Raghunath Othayoth Nambiar, David Walrath: Why You Should Run TPC- DS: A Workload Analysis. VLDB 2007: Pricing Specification Version 1.7.0: 7. Raghunath Othayoth Nambiar, Meikel Poess: The Making of TPC-DS. VLDB 2006:

11 SPEC CPU2006: 9. Standard Performance Evaluation Corporation: Statistical Workload Injector for MapReduce (SWIM): Swapnil Patil,Milo Polte, Kai Ren, Wittawat Tantisiriroj, Lin Xiao, Julio López, Garth Gibson, Adam Fuchs, Billie Rinaldi: YCSB++ : Benchmarking and Performance Debugging Advanced Features in Scalable Table Stores SOCC 11, October 27 28, 2011, Cascais, Protugal 12. Transaction Processing Performance Council: / 13. Transaction Processing Performance Council: TPC Benchmark DS. Standard (2012) 14. Trish Hogan: Overview of TPC Benchmark E: The Next Generation of OLTP Benchmarks. TPCTC 2009: Workshop On Big Data Benchmarking: Jim Gray: Sort Benchmark Home Page. 11

12 Appendix A: Workshop Participants 1. Anderson, Dave, Seagate 37. Schork, Nicholas, Scripps Research 2. Balazinski, Magda, U.Washington Institute 3. Bond, Andrew, Red Hat 38. Shainer, Gilad, Mellanox 4. Borthakur, Dhruba, Facebook 39. Sheahan, Dennis, Netflix 5. Carey, Michael, UC Irvine 40. Shekhar, Shashi, U Minnesota 6. Chandar, Vinoth, LinkedIn 41. Short, James, CLDS/SDSC 7. Chen, Yanpei, UC Berkeley/Cloudera 42. Skow, Dane, NSF 8. Cutforth, Craig, Seagate 43. Szalay, Alex, Johns Hopkins 9. Daniel, Stephen, NetApp 44. Taheri, Reza, VMware 10. Dunning, Ted, MapR/Mahout 11. EmirFarinas, Hulya, Greenplum 12. Engelbrecht, Andries, HP 13. Ferber, Dan, WhamCloud 14. Foltyn, Marty, CLDS/SDSC 15. Galloway, John 45. Vanderbilt, Richard, NetApp/OpenSFS 46. Wakou, Nicholas, Dell 47. Wyatt, Len, Microsoft 48. Yoder, Alan, SNIA 49. Zhao, Jerry, Google 16. Ghazal, Ahmad, Teradata Corporation 17. Gowda, Bhaskar, Intel 18. Graefe, Goetz, HP 19. Gupta, Amarnath, SDSC 20. Hawkins, Ron, CLDS/SDSC 21. Honavar, Vasant, NSF 22. Jackson, Jeff, Shell 23. Jacobsen, Arno, MSRG 24. Jost, Gabriele, AMD 25. Kelly, Mark, Convey Computer 26. Kent, Paul, SAS 27. Kocberber, Onur, EPFL 28. Kolcz, Alek, Twitter 29. Koren, Dan, Actian 30. Krishnan, Anu, Shell 31. Krneta, Paul, BMMsoft 32. Manegold, Stefan, CWI/Monet 33. Mankovskii, Serge, CA Labs 34. Miles, Casey, Brocade 35. O'Malley, Owen, Hortonworks 36. Raab, Francois, InfoSizing 12

13 Appendix B: Program/Organizing Committee 1. Baru, Chaitan, CLDS/SDSC 2. Bhandarkar, Milind, Greenplum 3. Gutkind, Eyal, Mellanox 4. Hawkins, Ron, CLDS/SDSC 5. Nambiar, Raghunath, Cisco 6. Osterberg, Ken, Seagate 7. Pearson, Scott, Brocade 8. Plale, Beth, IndianaU / HathiTrust Research Foundation 9. Poess, Meikel, Oracle 10. Rabl, Tilmann, MSRG, U.Toronto

Setting the Direction for Big Data Benchmark Standards Chaitan Baru, PhD San Diego Supercomputer Center UC San Diego

Setting the Direction for Big Data Benchmark Standards Chaitan Baru, PhD San Diego Supercomputer Center UC San Diego Setting the Direction for Big Data Benchmark Standards Chaitan Baru, PhD San Diego Supercomputer Center UC San Diego Industry s first workshop on big data benchmarking Acknowledgements National Science

More information

Setting the Direction for Big Data Benchmark Standards

Setting the Direction for Big Data Benchmark Standards Setting the Direction for Big Data Benchmark Standards Chaitan Baru, Center for Large-scale Data Systems research (CLDS), San Diego Supercomputer Center, UC San Diego Milind Bhandarkar, Greenplum Raghunath

More information

BENCHMARKING BIG DATA SYSTEMS AND THE BIGDATA TOP100 LIST

BENCHMARKING BIG DATA SYSTEMS AND THE BIGDATA TOP100 LIST BENCHMARKING BIG DATA SYSTEMS AND THE BIGDATA TOP100 LIST ORIGINAL ARTICLE Chaitanya Baru, 1 Milind Bhandarkar, 2 Raghunath Nambiar, 3 Meikel Poess, 4 and Tilmann Rabl 5 Abstract Big data has become a

More information

Shaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council (TPC)

Shaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council (TPC) Shaping the Landscape of Industry Standard Benchmarks: Contributions of the Transaction Processing Performance Council (TPC) Nicholas Wakou August 29, 2011 Seattle, WA Authors: Raghunath Othayoth Nambiar

More information

Big Data Generation. Tilmann Rabl and Hans-Arno Jacobsen

Big Data Generation. Tilmann Rabl and Hans-Arno Jacobsen Big Data Generation Tilmann Rabl and Hans-Arno Jacobsen Middleware Systems Research Group University of Toronto [email protected], [email protected] http://msrg.org Abstract. Big data challenges

More information

Welcome to the 6 th Workshop on Big Data Benchmarking

Welcome to the 6 th Workshop on Big Data Benchmarking Welcome to the 6 th Workshop on Big Data Benchmarking TILMANN RABL MIDDLEWARE SYSTEMS RESEARCH GROUP DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF TORONTO BANKMARK Please note! This workshop

More information

Survey of Big Data Benchmarking

Survey of Big Data Benchmarking Page 1 of 7 Survey of Big Data Benchmarking Kyle Cooper, [email protected] (A paper written under the guidance of Prof. Raj Jain) Download Abstract: The purpose of this paper is provide a survey of up to

More information

Virtualization Performance Insights from TPC-VMS

Virtualization Performance Insights from TPC-VMS Virtualization Performance Insights from TPC-VMS Wayne D. Smith, Shiny Sebastian Intel Corporation [email protected] [email protected] Abstract. This paper describes the TPC-VMS (Virtual Measurement

More information

How To Write A Bigbench Benchmark For A Retailer

How To Write A Bigbench Benchmark For A Retailer BigBench Overview Towards a Comprehensive End-to-End Benchmark for Big Data - bankmark UG (haftungsbeschränkt) 02/04/2015 @ SPEC RG Big Data The BigBench Proposal End to end benchmark Application level

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Introducing EEMBC Cloud and Big Data Server Benchmarks

Introducing EEMBC Cloud and Big Data Server Benchmarks Introducing EEMBC Cloud and Big Data Server Benchmarks Quick Background: Industry-Standard Benchmarks for the Embedded Industry EEMBC formed in 1997 as non-profit consortium Defining and developing application-specific

More information

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS

TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS 9 8 TRENDS IN THE DEVELOPMENT OF BUSINESS INTELLIGENCE SYSTEMS Assist. Prof. Latinka Todoranova Econ Lit C 810 Information technology is a highly dynamic field of research. As part of it, business intelligence

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics

QLogic 16Gb Gen 5 Fibre Channel for Database and Business Analytics QLogic 16Gb Gen 5 Fibre Channel for Database Assessment for Database and Business Analytics Using the information from databases and business analytics helps business-line managers to understand their

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

Benchmarking and Analysis of NoSQL Technologies

Benchmarking and Analysis of NoSQL Technologies Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The

More information

SPEC Research Group. Sam Kounev. SPEC 2015 Annual Meeting. Austin, TX, February 5, 2015

SPEC Research Group. Sam Kounev. SPEC 2015 Annual Meeting. Austin, TX, February 5, 2015 SPEC Research Group Sam Kounev SPEC 2015 Annual Meeting Austin, TX, February 5, 2015 Standard Performance Evaluation Corporation OSG HPG GWPG RG Open Systems Group High Performance Group Graphics and Workstation

More information

Apache Hadoop Patterns of Use

Apache Hadoop Patterns of Use Community Driven Apache Hadoop Apache Hadoop Patterns of Use April 2013 2013 Hortonworks Inc. http://www.hortonworks.com Big Data: Apache Hadoop Use Distilled There certainly is no shortage of hype when

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE

IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE IMPROVED FAIR SCHEDULING ALGORITHM FOR TASKTRACKER IN HADOOP MAP-REDUCE Mr. Santhosh S 1, Mr. Hemanth Kumar G 2 1 PG Scholor, 2 Asst. Professor, Dept. Of Computer Science & Engg, NMAMIT, (India) ABSTRACT

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010

Hadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010 Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More

More information

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de [email protected] T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

BIG DATA IN BUSINESS ENVIRONMENT

BIG DATA IN BUSINESS ENVIRONMENT Scientific Bulletin Economic Sciences, Volume 14/ Issue 1 BIG DATA IN BUSINESS ENVIRONMENT Logica BANICA 1, Alina HAGIU 2 1 Faculty of Economics, University of Pitesti, Romania [email protected] 2 Faculty

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI

More information

Development of a Computational and Data-Enabled Science and Engineering Ph.D. Program

Development of a Computational and Data-Enabled Science and Engineering Ph.D. Program Development of a Computational and Data-Enabled Science and Engineering Ph.D. Program Paul T. Bauman, Varun Chandola, Abani Patra Matthew Jones University at Buffalo, State University of New York EduHPC

More information

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation

More information

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database Built up on Cisco s big data common platform architecture (CPA), a

More information

BIG DATA-AS-A-SERVICE

BIG DATA-AS-A-SERVICE White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers

More information

BIG DATA USING HADOOP

BIG DATA USING HADOOP + Breakaway Session By Johnson Iyilade, Ph.D. University of Saskatchewan, Canada 23-July, 2015 BIG DATA USING HADOOP + Outline n Framing the Problem Hadoop Solves n Meet Hadoop n Storage with HDFS n Data

More information

Big Data Database Revenue and Market Forecast, 2012-2017

Big Data Database Revenue and Market Forecast, 2012-2017 Wikibon.com - http://wikibon.com Big Data Database Revenue and Market Forecast, 2012-2017 by David Floyer - 13 February 2013 http://wikibon.com/big-data-database-revenue-and-market-forecast-2012-2017/

More information

Stress-Testing Clouds for Big Data Applications

Stress-Testing Clouds for Big Data Applications Stress-Testing Clouds for Big Data Applications Alexandru Şutîi Stress-Testing Clouds for Big Data Applications Master s Thesis in Computer Science System Architecture and Networking Group Faculty of

More information

Big Data - Infrastructure Considerations

Big Data - Infrastructure Considerations April 2014, HAPPIEST MINDS TECHNOLOGIES Big Data - Infrastructure Considerations Author Anand Veeramani / Deepak Shivamurthy SHARING. MINDFUL. INTEGRITY. LEARNING. EXCELLENCE. SOCIAL RESPONSIBILITY. Copyright

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

DELL s Oracle Database Advisor

DELL s Oracle Database Advisor DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL

More information

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved. Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? HPC2012 Workshop Cetraro, Italy Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy? Bill Blake CTO Cray, Inc. The Big Data Challenge Supercomputing minimizes data

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering

Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations. Database Solutions Engineering Comprehending the Tradeoffs between Deploying Oracle Database on RAID 5 and RAID 10 Storage Configurations A Dell Technical White Paper Database Solutions Engineering By Sudhansu Sekhar and Raghunatha

More information

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Data Warehousing in the Age of Big Data

Data Warehousing in the Age of Big Data Data Warehousing in the Age of Big Data Krish Krishnan AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD * PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann is an imprint of Elsevier

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008

One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone. Michael Stonebraker December, 2008 One-Size-Fits-All: A DBMS Idea Whose Time has Come and Gone Michael Stonebraker December, 2008 DBMS Vendors (The Elephants) Sell One Size Fits All (OSFA) It s too hard for them to maintain multiple code

More information

Dell Reference Configuration for Hortonworks Data Platform

Dell Reference Configuration for Hortonworks Data Platform Dell Reference Configuration for Hortonworks Data Platform A Quick Reference Configuration Guide Armando Acosta Hadoop Product Manager Dell Revolutionary Cloud and Big Data Group Kris Applegate Solution

More information

Proact whitepaper on Big Data

Proact whitepaper on Big Data Proact whitepaper on Big Data Summary Big Data is not a definite term. Even if it sounds like just another buzz word, it manifests some interesting opportunities for organisations with the skill, resources

More information

IEEE BigData 2014 Tutorial on Big Data Benchmarking

IEEE BigData 2014 Tutorial on Big Data Benchmarking IEEE BigData 2014 Tutorial on Big Data Benchmarking Dr. Tilmann Rabl Middleware Systems Research Group, University of Toronto [email protected] Dr. Chaitan Baru San Diego Supercomputer Center, University

More information

An Oracle White Paper May 2012. Oracle Database Cloud Service

An Oracle White Paper May 2012. Oracle Database Cloud Service An Oracle White Paper May 2012 Oracle Database Cloud Service Executive Overview The Oracle Database Cloud Service provides a unique combination of the simplicity and ease of use promised by Cloud computing

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload Drive operational efficiency and lower data transformation costs with a Reference Architecture for an end-to-end optimization and offload

More information

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with

More information

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM Ramesh Maharjan and Manoj Shakya Department of Computer Science and Engineering Dhulikhel, Kavre, Nepal [email protected],

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Sunnie Chung. Cleveland State University

Sunnie Chung. Cleveland State University Sunnie Chung Cleveland State University Data Scientist Big Data Processing Data Mining 2 INTERSECT of Computer Scientists and Statisticians with Knowledge of Data Mining AND Big data Processing Skills:

More information