Which SQL Engine Leads the Herd?

Size: px
Start display at page:

Download "Which SQL Engine Leads the Herd?"

Transcription

1 October 2014 Which SQL Engine Leads the Herd? A Comparison of three leading SQL-on-Hadoop Implementations for compatibility, performance and scalability

2 Which SQL Engine Leads the Herd? 2 Contents Executive Summary... 2 The case for SQL on Hadoop... 3 Standards are job one... 3 Evaluating SQL-on-Hadoop solutions... 3 SQL-on-Hadoop not all are equal... 4 Beware of cherry pickers... 5 Performance matters too... 5 Big SQL means Big Investment Protection... 6 Value beyond compliance... 6 So Who leads the herd?... 7 For More Information... 7 Executive Summary In an independently audited benchmark of three popular SQLon-Hadoop implementations, IBM showed that Hadoop is ready to run OLAP and complex query workloads at a fraction of the cost of traditional systems that is, if you choose the right technology. Compatibility Matters IBM s Big SQL was the only solution able to execute all 99 TPC-DS queries at scale with minor modifications permissible under TPC rules. Cloudera s Impala was able to run 52 queries, and Hive ran 58 1 queries in a manner that complied with the TPC rules. For the remaining queries, re-writes were required (some extensive) and some queries could not be made to run at all. Lack of SQL compatibility is a nuisance in a benchmark, but could be a costly disaster in a production environment. Throughput matters When comparing Big SQL against the subset of queries that could actually run on competing platforms, IBM ran queries on average 3.6 times faster than Impala and 5.4 times faster than Hive on a 10 TB scale test 2. Not only was Big SQL the only engine able to run the Hadoop DS workload it ran the workload significantly faster as well for both Hadoop-DS single-user and multi-user tests. Scale matters Hadoop is about big data after all. IBM had originally planned to compare all three vendors at a 30TB scale, but it achieving stability at scale was a challenge. While Big SQL could reliably execute all queries at a 30TB scale, the competitive platforms could not, exhibiting various run-time errors. The comparison was made instead at the 10TB scale where results could be repeated, and thus audited. These findings are compelling. Not only was IBM s Big SQL the only Hadoop solution tested able to actually run the complete set of queries, but it was found to be the fastest, the most scalable, and the most reliable as well. With so many vendors making claims about the performance and compatibility of SQL-on-Hadoop, IBM decided to put leading distributions to the test, conducting the first ever Hadoop-DS benchmark. The test compared IBM s Big SQL with Cloudera s Impala and Hortonworks Hive Hadoop-DS is a Hadoop Decision Support benchmark developed by IBM modeled after the highly regarded Transaction Processing Council Decision Support (TPC-DS ) benchmark. To help make the process fair, IBM established three competing teams running each Hadoop distribution on identical hardware configurations. IBM engaged the services of an independent TPC professional to audit and help document the result. Among the key findings were: 1 These results refer to initial testing at 1 GB scale for compatibility. At 10 TB scale, both Impala and Hive ran fewer queries. 2 This result based on the Hadoop DS single-user run. Detailed performance results for single and multi user runs are detailed in a separate benchmark report.

3 Which SQL Engine Leads the Herd? 3 The case for SQL on Hadoop In today s data centers SQL has become a ubiquitous way to access and manipulate data. No longer a tool used just by developers and database administrators, today most professionals and analysts have at least some knowledge of SQL or use tools that rely on SQL as a standard. While non-structured data types gets all the attention for big data workloads, the majority of real projects involve transactional or log data 3 data formats generally well-suited to manipulation with SQL. While Hive was the only game in town just a few years ago, today there are at least a dozen competing commercial and open source efforts around SQL-on- Hadoop. Vendors are competing based on performance, compatibility, and the ability to scale to support real-world production workloads. Standards are job one Standards are important in every industry. They help reduce cost, expand markets, spur innovation, reduce risk and generally give organizations a competitive edge. This is true of SQL as well. In almost every organization, SQL is at the heart of enterprise data used in transactional systems, data warehouses, columnar databases and analytics platforms to name just a few examples. Additionally, a vast number of commercial and in-house developed tools used to access, manipulate and visualize data rely on SQL. SQL is lifeblood of the modern transaction and decision support systems. The last thing an organization wants to do is introduce technology that is not compatible with what they have. It s nice to be able to use open source software, but at the end of the day standardization is what matters most the software needs to work. Evaluating SQL-on-Hadoop solutions As customers know, testing their own applications is the only benchmark that matters, but when it comes to a standard benchmark, the TPC Benchmark DS (TPC-DS) is among the most thorough. TPC-DS is a decision support benchmark that models several aspects of the business operations of a global retailer. Comprised of 99 separate queries, it models real-world business operations that companies in this and other industries would find familiar. While there are no official results at the time of this writing, TPC-DS is widely regarded as a fair and complete benchmark. The rigor and realism of the benchmark makes it almost impossible for vendors to game the benchmark as long as they are properly adhering to the benchmark specification and rules. 3 70% of 465 survey respondents cite transactional data as a primary target for big data initiatives - Gartner research note Survey Analysis - Big Data Adoption in 2013 Shows Substance Behind the Hype Sept Analyst(s): Lisa Kart, Nick Heudecker, Frank Buytendijk

4 Which SQL Engine Leads the Herd? 4 SQL-on-Hadoop systems cannot meet several of the technology requirements of the TPC-DS benchmark, so IBM modeled the Hadoop-DS benchmark on TPC-DS, using the same data sets and queries, but not performing data maintenance operations, and not enforcing referential integrity or meeting other benchmark requirements not feasible with Hadoop systems. The benchmark is designed to model systems where operational data is used both to make business decisions quickly and to direct long range planning and operation. The types of queries involved fall broadly into four different categories. Reporting queries Ad-hoc queries Iterative OLAP queries Data mining queries Because the sizes of businesses vary, the benchmark is designed to scale also model different sizes of warehouses. Standard scale sizes are 100GB, 300GB, 1TB, 3TB, 10TB, 30TB and 100TB. SQL-on-Hadoop not all are equal One of the first hurdles in conducting the benchmark is simply getting the queries to run across all three Hadoop environments. From this point of view, not all SQL-on-Hadoop implementations are created equal. As shown in Figure 1, in initial testing IBM Big SQL was able to run 99 of the standard TPC-DS queries after building the dataset. 87 queries ran out-of-the box and an additional 12 were easily modified within a few hours with minor syntax changes allowable under the TPC-DS benchmark specification 4. Figure 1 Query compatibility by SQL-on-Hadoop solution Other distributions did not fare so well. In the case of Cloudera s Impala, 35 queries ran un-modified, 17 required minor modifications complying with TPC-DS rules, and 36 required more extensive non-compliant modifications. More concerning was that some of the 99 queries could not be run at 4 Section 4.2 of the TPC-DS specification available from tpc.org provides rules around what types of modifications are permissible and which are not.

5 Which SQL Engine Leads the Herd? 5 all either because no re-write to the query was found or because the queries would fail at run-time. In the case of Hive.13 the situation was similar. 32 queries ran out-the-box, an additional 26 queries ran with compliant modifications, and 13 queries could be re-written with noncompliant modifications. As team scaled up the size of the test however queries that worked at smaller scale stopped working. At a 10TB dataset size 30 of the queries would not run at all. This exercise highlights the challenge that customers can be expected to encounter when seeking to adapt existing SQL schemas and applications to SQL-on-Hadoop implementations. Beware of cherry pickers Vendors have been making many performance claims related to the TPC-DS benchmark, cherry picking queries from the suite of 99 queries, and publishing only those queries that happen to work, and show an offering in the best possible light. In some cases, vendors have even altered table schemas to avoid compatibility issues or boost performance. Clearly this is not a proper way to run a benchmark. In fact, the rules of the benchmark specifically forbid this practice. The real news is not that selected queries can be made to run faster, but that many of the ANSI SQL queries in the benchmark simply don t run at all on competitive platforms. You can just imagine the challenges associated with getting your own production application running on a database platform that doesn t support 50% of your standard queries. This would amount to a re-write of the application, introducing risk, added-costs and certain delays. Performance matters too A full comparison between Impala, Hive and Big SQL could not be made because Hive and Impala could only run a subset of the queries. It was still possible though to compare results for the common set of 46 queries that all distributions were able to run at a 10 TB scale. Figure 2 shows a direct comparison of the elapsed time for the common set of queries across the three distributions. Although Big SQL was able to run all 99 queries, in this comparison only the queries that Hive and Impala were able to run are included in this result. IBM s Big SQL was able to complete all the common queries in 48 minutes and 28 seconds while Impala took 2 hours, 55 minutes and 36 seconds. Hive.13 exhibited the worst performance running the queries in 4 hours, 25 minutes and 49 seconds.

6 Which SQL Engine Leads the Herd? 6 18,000 16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000 Figure 2: Time in seconds to run the common set of 46 queries on each SQL-on-Hadoop platform The same ranking was observed in the multiuser test consisting of four concurrent streams of queries executing. Big SQL means Big Investment Protection The good news for IBM customers is that Big SQL offers significant investment protection. Not is Big SQL the only SQL-on-Hadoop test that was able to run all the queries, it was also the fastest, and the only engine to scale to a 30 TB dataset size. What s even better is that customers don t need to compromise on standards. Rigorous SQL compatibility means that customers can: 0 Big SQL Impala Hive Leverage existing investments in software, tools and people skills Run existing applications where appropriate over SQL-on-Hadoop data stores Runs queries faster, more efficiently and at a larger scale translating into a lower operations costs Value beyond compliance Standards compliance and performance are essential, but it is also important that a chosen SQL implementation play nice with others. In Hadoop, playing nice means a number of things: Supporting open data formats Using standard client-side database drivers Supporting built-in functions that SQL users expect Providing sophisticated security capabilities Federated access to multiple data sources Open, standard data formats SQL is a useful language. Thanks to standardization and decades of maturation, it is well known and adept at solving many problems. It is not however the only language available, nor is it always the best solution for every problem. Hadoop has an ever-expanding array of languages and tools for analyzing large datasets, but to be able to use these rich tools, data needs to exist in standard Hadoop data formats. Hadoop enthusiasts will be pleased to know that there is no such thing as a Big SQL data store. A table defined in Hive is a table defined in Big SQL and vice versa. Big SQL supports 100% native HDFS file formats. This is not true of all distributions, and customers should be wary of SQL-on- Hadoop implementations that introduce their own proprietary metadata. Common client-side drivers Beyond the ability to share SQL across platforms, Big SQL supports standard IBM client drivers allowing the same set of standards-compliant JDBC, JCC, ODBC, CLI and.net drivers to be used across multiple databases and operating systems. Clients using these drivers can access IBM Big SQL, DB2, IBM Informix and third party database environments transparently. By combining a standard SQL implementation with industry standard drivers, the number of ISV applications that can interact seamlessly with IBM s Big SQL is enlarged. Built-in functions drive productivity - Having an SQL implementation that merely works is different than having an environment that makes users productive. Incorporating a rich library of over 250 built-in functions along with SQL OLAP functions, Big SQL is built for analytics. It provides advanced features including sub-query support, additional SQL types and global (session) variables. With these additional features users

7 Which SQL Engine Leads the Herd? 7 can simply do more things, and answer more questions from within the SQL environment. When using other SQL solutions that may lack these features, users may find themselves writing custom code to implement the same capabilities already built into Big SQL. Security and auditing For some SQL implementations, security is an afterthought. Big SQL was built with security in mind. User authentication is handled using standard mechanisms including LDAP and Kerberos so that Big SQL fits seamlessly into your enterprise environment. Big SQL supports flexible authorization controls based on users, groups and roles. It uses standard SQL GRANT and REVOKE syntax familiar to database administrators. In addition to basic tablelevel access controls, Big SQL supports fine-grained role and column level access controls (RCAC). Fine grained access control and features like data masking help expand the range of solutions that Big SQL is applicable to. In addition to flexible authentication and authorization, Big SQL also provides extensive auditing facilities. In short, Big SQL brings the rich security features that RDBMS administrators expect to the world of Hadoop. Federated queries In modern data centers, data seldom exists in one place. Some data will exist in relational databases and other data will be in data warehouses or specialized column-oriented databases. Big SQL supports rich federation capabilities allowing users to write queries that access not only Hadoop-based data, but other databases as well. A single query may join data from Big SQL, Hive on Hadoop, a table on a Teradata warehouse, and data from an Oracle database. So Who leads the herd? The findings of this benchmark are compelling. Organizations are heavily invested in SQL. The last thing customers need is one SQL dialect for Hadoop, and another for their existing database environments. The fact that Big SQL was the only SQL-on-Hadoop implementation able to actually run the Hadoop-DS workload is important. The fact that it is also faster, more scalable, more stable, and has a richer set of features is very impressive indeed. Big SQL was the only implementation able to run the full Hadoop DS benchmark with all 99 queries Big SQL delivered over three times the performance of the nearest competitor in the single user test Big SQL was the only offering able to scale to 30 TB and run the full workload at that scale This result should not be surprising. IBM invented SQL after all and has over 30 years of experience building SQL query engines and optimizers. When it comes to SQL-on-Hadoop, IBM InfoSphere BigInsights with Big SQL clearly leads the herd. For More Information To learn more about Big SQL, download the free IBM whitepaper SQL-on-Hadoop without compromise at source=sw-infomgt&s_pkg=ov23626 To try IBM Big SQL for free, download IBM s free InfoSphere BigInsights QuickStart Edition, or run BigInsights in the cloud at

8 Which SQL Engine Leads the Herd? 8 Please note: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Copyright IBM Corporation 2014 IBM Canada 3600 Steeles Ave East Markham, Ontario L3R 9Z7 Produced in Canada October 2014 All Rights Reserved IBM, the IBM logo, ibm.com, BigInsights, Cognos, DB2, Informix, InfoSphere, PureData and z/os are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at ibm.com/legal/copytrade.shtml TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. Hortonworks is a trademark of Hortonworks Inc. Hadoop and Hive are trademarks of the Apache Software Foundation Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product and service names may be trademarks or service marks of others. References in this publication to IBM products and services do not imply that IBM intends to make them available in all countries in which IBM operates. IMW14799-USEN-00

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

The IBM Cognos Platform for Enterprise Business Intelligence

The IBM Cognos Platform for Enterprise Business Intelligence The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics

Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

AtScale Intelligence Platform

AtScale Intelligence Platform AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE

More information

IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite

IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite Streamline test-data management and deliver reliable application upgrades and enhancements Highlights Apply test-data management

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide

Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide IBM Cognos Business Intelligence (BI) helps you make better and smarter business decisions faster. Advanced visualization

More information

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances IBM Software Business Analytics Cognos Business Intelligence IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances 2 IBM Cognos 10: Enhancing query processing performance for

More information

IBM Cognos TM1. Enterprise planning, budgeting and analysis. Highlights. IBM Software Data Sheet

IBM Cognos TM1. Enterprise planning, budgeting and analysis. Highlights. IBM Software Data Sheet IBM Software IBM Cognos TM1 Enterprise planning, budgeting and analysis Highlights Reduces planning cycles by as much as 75% and reporting from days to minutes Owned and managed by Finance and lines of

More information

IBM InfoSphere Optim Test Data Management

IBM InfoSphere Optim Test Data Management IBM InfoSphere Optim Test Data Management Highlights Create referentially intact, right-sized test databases or data warehouses Automate test result comparisons to identify hidden errors and correct defects

More information

Big Data Strategies with IMS

Big Data Strategies with IMS Big Data Strategies with IMS #16103 Richard Tran IMS Development richtran@us.ibm.com Insert Custom Session QR if Desired. Agenda Big Data in an Information Driven economy Why start with System z IMS strategies

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Blistering Fast SQL Access to Hadoop using. IBM BigInsights 3.0 with Big SQL 3.0

Blistering Fast SQL Access to Hadoop using. IBM BigInsights 3.0 with Big SQL 3.0 Blistering Fast SQL Access to Hadoop using IBM BigInsights 3.0 with Big SQL 3.0 SQL-over-Hadoop implementations are ready to execute OLAP complex query workloads at a fraction of the cost of traditional

More information

IBM Cognos Performance Management Solutions for Oracle

IBM Cognos Performance Management Solutions for Oracle IBM Cognos Performance Management Solutions for Oracle Gain more value from your Oracle technology investments Highlights Deliver the power of predictive analytics across the organization Address diverse

More information

Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com

Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Big Data projects and use cases. Claus Samuelsen IBM Analytics, Europe csa@dk.ibm.com

Big Data projects and use cases. Claus Samuelsen IBM Analytics, Europe csa@dk.ibm.com Big projects and use cases Caus Samuesen IBM Anaytics, Europe csa@dk.ibm.com IBM Sofware Overview of BigInsights IBM BigInsights Scientist Free Quick Start (non production): IBM Open Patform BigInsights

More information

IBM Cognos Analysis for Microsoft Excel

IBM Cognos Analysis for Microsoft Excel IBM Software Group Data Sheet IBM Cognos Analysis for Microsoft Excel Highlights Explore and analyze trusted and secure BI data in a familiar spreadsheet format Develop high frequency and high priority

More information

Big Data and Market Surveillance. April 28, 2014

Big Data and Market Surveillance. April 28, 2014 Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part

More information

IBM InfoSphere Optim Test Data Management Solution

IBM InfoSphere Optim Test Data Management Solution IBM InfoSphere Optim Test Data Management Solution Highlights Create referentially intact, right-sized test databases Automate test result comparisons to identify hidden errors Easily refresh and maintain

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

PROGRESS DATADIRECT QA AND PERFORMANCE TESTING EXTENSIVE TESTING ENSURES DATA CONNECTIVITY THAT WORKS

PROGRESS DATADIRECT QA AND PERFORMANCE TESTING EXTENSIVE TESTING ENSURES DATA CONNECTIVITY THAT WORKS Progress DataDirect Connect DATA SHEET PROGRESS DATADIRECT QA AND PERFORMANCE TESTING EXTENSIVE TESTING ENSURES DATA CONNECTIVITY THAT WORKS Progress DataDirect ODBC, JDBC and ADO.NET data connectivity

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Actian SQL in Hadoop Buyer s Guide

Actian SQL in Hadoop Buyer s Guide Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop

More information

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance

IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate

More information

Control-M for Hadoop. Technical Bulletin. www.bmc.com

Control-M for Hadoop. Technical Bulletin. www.bmc.com Technical Bulletin Control-M for Hadoop Version 8.0.00 September 30, 2014 Tracking number: PACBD.8.0.00.004 BMC Software is announcing that Control-M for Hadoop now supports the following: Secured Hadoop

More information

IBM Cognos Business Intelligence Scorecarding

IBM Cognos Business Intelligence Scorecarding IBM Cognos Business Intelligence Scorecarding Successfully linking strategy to operations Overview Scorecarding offers a proven approach to communicating business strategy throughout the organization and

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Rocket AS v6.3. Benefits of upgrading

Rocket AS v6.3. Benefits of upgrading Rocket AS v6.3 Benefits of upgrading What is Rocket AS? Rocket AS for IBM System z provides query, reporting, data visualization and rapid application development for System z data including DB2. With

More information

Focus on the business, not the business of data warehousing!

Focus on the business, not the business of data warehousing! Focus on the business, not the business of data warehousing! Adam M. Ronthal Technical Product Marketing and Strategy Big Data, Cloud, and Appliances @ARonthal 1 Disclaimer Copyright IBM Corporation 2014.

More information

Big Data and Apache Hadoop Adoption:

Big Data and Apache Hadoop Adoption: Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards

More information

How To Write A Bigbench Benchmark For A Retailer

How To Write A Bigbench Benchmark For A Retailer BigBench Overview Towards a Comprehensive End-to-End Benchmark for Big Data - bankmark UG (haftungsbeschränkt) 02/04/2015 @ SPEC RG Big Data The BigBench Proposal End to end benchmark Application level

More information

How To Set Up An Ibm Marketing Management System

How To Set Up An Ibm Marketing Management System IBM Enterprise Marketing Management 9.1.2 Recommended Software Environments and Minimum System Requirements 9/23/2015 IBM Corporation Copyright Copyright IBM 2015 IBM Corporation B1WA LKG1 550 King Street

More information

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com

Apache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge

More information

Performance and Scalability Overview

Performance and Scalability Overview Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING

More information

Actian Vector in Hadoop

Actian Vector in Hadoop Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 5/7/2014 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

In-Memory Analytics for Big Data

In-Memory Analytics for Big Data In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...

More information

IBM Software Top tips for securing big data environments

IBM Software Top tips for securing big data environments IBM Software Top tips for securing big data environments Why big data doesn t have to mean big security challenges 2 Top Comprehensive tips for securing data big protection data environments for physical,

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop

More information

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse

Information management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

IBM Content Integrator Enterprise Edition, Version 8.5.1

IBM Content Integrator Enterprise Edition, Version 8.5.1 IBM Software Information Management IBM Content Integrator Enterprise Edition, Version 8.5.1 Highlights Enriches portals and key business applications with federated access to content stored in multiple

More information

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)

SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server White Paper Published: January 2012 Applies to: SQL Server 2012 Summary: This paper explains the different ways in which databases

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

How To Use Hp Vertica Ondemand

How To Use Hp Vertica Ondemand Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

Oracle Big Data Strategy Simplified Infrastrcuture

Oracle Big Data Strategy Simplified Infrastrcuture Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly

More information

Big Data for the Rest of Us Technical White Paper

Big Data for the Rest of Us Technical White Paper Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper

Vectorwise 3.0 Fast Answers from Hadoop. Technical white paper Vectorwise 3.0 Fast Answers from Hadoop Technical white paper 1 Contents Executive Overview 2 Introduction 2 Analyzing Big Data 3 Vectorwise and Hadoop Environments 4 Vectorwise Hadoop Connector 4 Performance

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

IBM Tivoli Storage FlashCopy Manager

IBM Tivoli Storage FlashCopy Manager IBM Storage FlashCopy Manager Online, near-instant snapshot backup and restore of critical business applications Highlights Perform near-instant application-aware snapshot backup and restore, with minimal

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile. Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics

More information

Discovering Business Insights in Big Data Using SQL-MapReduce

Discovering Business Insights in Big Data Using SQL-MapReduce Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013

More information

User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources)

User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources) User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources) Nature of Document: Guideline Product(s): IBM Cognos 8 BI Area of Interest: Security Version: 1.2 2 Copyright and Trademarks Licensed

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Scenario 2: Cognos SQL and Native SQL.

Scenario 2: Cognos SQL and Native SQL. Proven Practice Scenario 2: Cognos SQL and Native SQL. Product(s): IBM Cognos ReportNet and IBM Cognos 8 Area of Interest: Performance Scenario 2: Cognos SQL and Native SQL. 2 Copyright Copyright 2008

More information

IBM InfoSphere Optim Data Masking solution

IBM InfoSphere Optim Data Masking solution IBM InfoSphere Optim Data Masking solution Mask data on demand to protect privacy across the enterprise Highlights: Safeguard personally identifiable information, trade secrets, financials and other sensitive

More information

Building Your Big Data Team

Building Your Big Data Team Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.

More information

IBM WebSphere Application Server Family

IBM WebSphere Application Server Family IBM IBM Family Providing the right application foundation to meet your business needs Highlights Build a strong foundation and reduce costs with the right application server for your business needs Increase

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Oracle Big Data Management System

Oracle Big Data Management System Oracle Big Data Management System A Statement of Direction for Big Data and Data Warehousing Platforms O R A C L E S T A T E M E N T O F D I R E C T I O N A P R I L 2 0 1 5 Disclaimer The following is

More information

How to Deliver Measurable Business Value with the Enterprise CMDB

How to Deliver Measurable Business Value with the Enterprise CMDB How to Deliver Measurable Business Value with the Enterprise CMDB James Moore jdmoore@us.ibm.com Product Manager, Business Service, Netcool/Impact 2010 IBM Corporation Agenda What is a CMDB? What are CMDB

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

How to Choose Between Hadoop, NoSQL and RDBMS

How to Choose Between Hadoop, NoSQL and RDBMS How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A

More information

Teradata s Big Data Technology Strategy & Roadmap

Teradata s Big Data Technology Strategy & Roadmap Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any

More information

The New Promise of Business Intelligence

The New Promise of Business Intelligence IBM Software Group White Paper Business Analytics The New Promise of Business Intelligence 2 The New Promise of Business Intelligence Introduction: A lot has changed since business intelligence first came

More information

Next-Generation Cloud Analytics with Amazon Redshift

Next-Generation Cloud Analytics with Amazon Redshift Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value

Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Published by: Value Prism Consulting Sponsored by: Microsoft Corporation Publish date: March 2013 Abstract: Data

More information

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012, Oracle and/or its affiliates. All rights reserved. 1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions

More information

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information