Which SQL Engine Leads the Herd?
|
|
- Crystal Dorsey
- 8 years ago
- Views:
Transcription
1 October 2014 Which SQL Engine Leads the Herd? A Comparison of three leading SQL-on-Hadoop Implementations for compatibility, performance and scalability
2 Which SQL Engine Leads the Herd? 2 Contents Executive Summary... 2 The case for SQL on Hadoop... 3 Standards are job one... 3 Evaluating SQL-on-Hadoop solutions... 3 SQL-on-Hadoop not all are equal... 4 Beware of cherry pickers... 5 Performance matters too... 5 Big SQL means Big Investment Protection... 6 Value beyond compliance... 6 So Who leads the herd?... 7 For More Information... 7 Executive Summary In an independently audited benchmark of three popular SQLon-Hadoop implementations, IBM showed that Hadoop is ready to run OLAP and complex query workloads at a fraction of the cost of traditional systems that is, if you choose the right technology. Compatibility Matters IBM s Big SQL was the only solution able to execute all 99 TPC-DS queries at scale with minor modifications permissible under TPC rules. Cloudera s Impala was able to run 52 queries, and Hive ran 58 1 queries in a manner that complied with the TPC rules. For the remaining queries, re-writes were required (some extensive) and some queries could not be made to run at all. Lack of SQL compatibility is a nuisance in a benchmark, but could be a costly disaster in a production environment. Throughput matters When comparing Big SQL against the subset of queries that could actually run on competing platforms, IBM ran queries on average 3.6 times faster than Impala and 5.4 times faster than Hive on a 10 TB scale test 2. Not only was Big SQL the only engine able to run the Hadoop DS workload it ran the workload significantly faster as well for both Hadoop-DS single-user and multi-user tests. Scale matters Hadoop is about big data after all. IBM had originally planned to compare all three vendors at a 30TB scale, but it achieving stability at scale was a challenge. While Big SQL could reliably execute all queries at a 30TB scale, the competitive platforms could not, exhibiting various run-time errors. The comparison was made instead at the 10TB scale where results could be repeated, and thus audited. These findings are compelling. Not only was IBM s Big SQL the only Hadoop solution tested able to actually run the complete set of queries, but it was found to be the fastest, the most scalable, and the most reliable as well. With so many vendors making claims about the performance and compatibility of SQL-on-Hadoop, IBM decided to put leading distributions to the test, conducting the first ever Hadoop-DS benchmark. The test compared IBM s Big SQL with Cloudera s Impala and Hortonworks Hive Hadoop-DS is a Hadoop Decision Support benchmark developed by IBM modeled after the highly regarded Transaction Processing Council Decision Support (TPC-DS ) benchmark. To help make the process fair, IBM established three competing teams running each Hadoop distribution on identical hardware configurations. IBM engaged the services of an independent TPC professional to audit and help document the result. Among the key findings were: 1 These results refer to initial testing at 1 GB scale for compatibility. At 10 TB scale, both Impala and Hive ran fewer queries. 2 This result based on the Hadoop DS single-user run. Detailed performance results for single and multi user runs are detailed in a separate benchmark report.
3 Which SQL Engine Leads the Herd? 3 The case for SQL on Hadoop In today s data centers SQL has become a ubiquitous way to access and manipulate data. No longer a tool used just by developers and database administrators, today most professionals and analysts have at least some knowledge of SQL or use tools that rely on SQL as a standard. While non-structured data types gets all the attention for big data workloads, the majority of real projects involve transactional or log data 3 data formats generally well-suited to manipulation with SQL. While Hive was the only game in town just a few years ago, today there are at least a dozen competing commercial and open source efforts around SQL-on- Hadoop. Vendors are competing based on performance, compatibility, and the ability to scale to support real-world production workloads. Standards are job one Standards are important in every industry. They help reduce cost, expand markets, spur innovation, reduce risk and generally give organizations a competitive edge. This is true of SQL as well. In almost every organization, SQL is at the heart of enterprise data used in transactional systems, data warehouses, columnar databases and analytics platforms to name just a few examples. Additionally, a vast number of commercial and in-house developed tools used to access, manipulate and visualize data rely on SQL. SQL is lifeblood of the modern transaction and decision support systems. The last thing an organization wants to do is introduce technology that is not compatible with what they have. It s nice to be able to use open source software, but at the end of the day standardization is what matters most the software needs to work. Evaluating SQL-on-Hadoop solutions As customers know, testing their own applications is the only benchmark that matters, but when it comes to a standard benchmark, the TPC Benchmark DS (TPC-DS) is among the most thorough. TPC-DS is a decision support benchmark that models several aspects of the business operations of a global retailer. Comprised of 99 separate queries, it models real-world business operations that companies in this and other industries would find familiar. While there are no official results at the time of this writing, TPC-DS is widely regarded as a fair and complete benchmark. The rigor and realism of the benchmark makes it almost impossible for vendors to game the benchmark as long as they are properly adhering to the benchmark specification and rules. 3 70% of 465 survey respondents cite transactional data as a primary target for big data initiatives - Gartner research note Survey Analysis - Big Data Adoption in 2013 Shows Substance Behind the Hype Sept Analyst(s): Lisa Kart, Nick Heudecker, Frank Buytendijk
4 Which SQL Engine Leads the Herd? 4 SQL-on-Hadoop systems cannot meet several of the technology requirements of the TPC-DS benchmark, so IBM modeled the Hadoop-DS benchmark on TPC-DS, using the same data sets and queries, but not performing data maintenance operations, and not enforcing referential integrity or meeting other benchmark requirements not feasible with Hadoop systems. The benchmark is designed to model systems where operational data is used both to make business decisions quickly and to direct long range planning and operation. The types of queries involved fall broadly into four different categories. Reporting queries Ad-hoc queries Iterative OLAP queries Data mining queries Because the sizes of businesses vary, the benchmark is designed to scale also model different sizes of warehouses. Standard scale sizes are 100GB, 300GB, 1TB, 3TB, 10TB, 30TB and 100TB. SQL-on-Hadoop not all are equal One of the first hurdles in conducting the benchmark is simply getting the queries to run across all three Hadoop environments. From this point of view, not all SQL-on-Hadoop implementations are created equal. As shown in Figure 1, in initial testing IBM Big SQL was able to run 99 of the standard TPC-DS queries after building the dataset. 87 queries ran out-of-the box and an additional 12 were easily modified within a few hours with minor syntax changes allowable under the TPC-DS benchmark specification 4. Figure 1 Query compatibility by SQL-on-Hadoop solution Other distributions did not fare so well. In the case of Cloudera s Impala, 35 queries ran un-modified, 17 required minor modifications complying with TPC-DS rules, and 36 required more extensive non-compliant modifications. More concerning was that some of the 99 queries could not be run at 4 Section 4.2 of the TPC-DS specification available from tpc.org provides rules around what types of modifications are permissible and which are not.
5 Which SQL Engine Leads the Herd? 5 all either because no re-write to the query was found or because the queries would fail at run-time. In the case of Hive.13 the situation was similar. 32 queries ran out-the-box, an additional 26 queries ran with compliant modifications, and 13 queries could be re-written with noncompliant modifications. As team scaled up the size of the test however queries that worked at smaller scale stopped working. At a 10TB dataset size 30 of the queries would not run at all. This exercise highlights the challenge that customers can be expected to encounter when seeking to adapt existing SQL schemas and applications to SQL-on-Hadoop implementations. Beware of cherry pickers Vendors have been making many performance claims related to the TPC-DS benchmark, cherry picking queries from the suite of 99 queries, and publishing only those queries that happen to work, and show an offering in the best possible light. In some cases, vendors have even altered table schemas to avoid compatibility issues or boost performance. Clearly this is not a proper way to run a benchmark. In fact, the rules of the benchmark specifically forbid this practice. The real news is not that selected queries can be made to run faster, but that many of the ANSI SQL queries in the benchmark simply don t run at all on competitive platforms. You can just imagine the challenges associated with getting your own production application running on a database platform that doesn t support 50% of your standard queries. This would amount to a re-write of the application, introducing risk, added-costs and certain delays. Performance matters too A full comparison between Impala, Hive and Big SQL could not be made because Hive and Impala could only run a subset of the queries. It was still possible though to compare results for the common set of 46 queries that all distributions were able to run at a 10 TB scale. Figure 2 shows a direct comparison of the elapsed time for the common set of queries across the three distributions. Although Big SQL was able to run all 99 queries, in this comparison only the queries that Hive and Impala were able to run are included in this result. IBM s Big SQL was able to complete all the common queries in 48 minutes and 28 seconds while Impala took 2 hours, 55 minutes and 36 seconds. Hive.13 exhibited the worst performance running the queries in 4 hours, 25 minutes and 49 seconds.
6 Which SQL Engine Leads the Herd? 6 18,000 16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000 Figure 2: Time in seconds to run the common set of 46 queries on each SQL-on-Hadoop platform The same ranking was observed in the multiuser test consisting of four concurrent streams of queries executing. Big SQL means Big Investment Protection The good news for IBM customers is that Big SQL offers significant investment protection. Not is Big SQL the only SQL-on-Hadoop test that was able to run all the queries, it was also the fastest, and the only engine to scale to a 30 TB dataset size. What s even better is that customers don t need to compromise on standards. Rigorous SQL compatibility means that customers can: 0 Big SQL Impala Hive Leverage existing investments in software, tools and people skills Run existing applications where appropriate over SQL-on-Hadoop data stores Runs queries faster, more efficiently and at a larger scale translating into a lower operations costs Value beyond compliance Standards compliance and performance are essential, but it is also important that a chosen SQL implementation play nice with others. In Hadoop, playing nice means a number of things: Supporting open data formats Using standard client-side database drivers Supporting built-in functions that SQL users expect Providing sophisticated security capabilities Federated access to multiple data sources Open, standard data formats SQL is a useful language. Thanks to standardization and decades of maturation, it is well known and adept at solving many problems. It is not however the only language available, nor is it always the best solution for every problem. Hadoop has an ever-expanding array of languages and tools for analyzing large datasets, but to be able to use these rich tools, data needs to exist in standard Hadoop data formats. Hadoop enthusiasts will be pleased to know that there is no such thing as a Big SQL data store. A table defined in Hive is a table defined in Big SQL and vice versa. Big SQL supports 100% native HDFS file formats. This is not true of all distributions, and customers should be wary of SQL-on- Hadoop implementations that introduce their own proprietary metadata. Common client-side drivers Beyond the ability to share SQL across platforms, Big SQL supports standard IBM client drivers allowing the same set of standards-compliant JDBC, JCC, ODBC, CLI and.net drivers to be used across multiple databases and operating systems. Clients using these drivers can access IBM Big SQL, DB2, IBM Informix and third party database environments transparently. By combining a standard SQL implementation with industry standard drivers, the number of ISV applications that can interact seamlessly with IBM s Big SQL is enlarged. Built-in functions drive productivity - Having an SQL implementation that merely works is different than having an environment that makes users productive. Incorporating a rich library of over 250 built-in functions along with SQL OLAP functions, Big SQL is built for analytics. It provides advanced features including sub-query support, additional SQL types and global (session) variables. With these additional features users
7 Which SQL Engine Leads the Herd? 7 can simply do more things, and answer more questions from within the SQL environment. When using other SQL solutions that may lack these features, users may find themselves writing custom code to implement the same capabilities already built into Big SQL. Security and auditing For some SQL implementations, security is an afterthought. Big SQL was built with security in mind. User authentication is handled using standard mechanisms including LDAP and Kerberos so that Big SQL fits seamlessly into your enterprise environment. Big SQL supports flexible authorization controls based on users, groups and roles. It uses standard SQL GRANT and REVOKE syntax familiar to database administrators. In addition to basic tablelevel access controls, Big SQL supports fine-grained role and column level access controls (RCAC). Fine grained access control and features like data masking help expand the range of solutions that Big SQL is applicable to. In addition to flexible authentication and authorization, Big SQL also provides extensive auditing facilities. In short, Big SQL brings the rich security features that RDBMS administrators expect to the world of Hadoop. Federated queries In modern data centers, data seldom exists in one place. Some data will exist in relational databases and other data will be in data warehouses or specialized column-oriented databases. Big SQL supports rich federation capabilities allowing users to write queries that access not only Hadoop-based data, but other databases as well. A single query may join data from Big SQL, Hive on Hadoop, a table on a Teradata warehouse, and data from an Oracle database. So Who leads the herd? The findings of this benchmark are compelling. Organizations are heavily invested in SQL. The last thing customers need is one SQL dialect for Hadoop, and another for their existing database environments. The fact that Big SQL was the only SQL-on-Hadoop implementation able to actually run the Hadoop-DS workload is important. The fact that it is also faster, more scalable, more stable, and has a richer set of features is very impressive indeed. Big SQL was the only implementation able to run the full Hadoop DS benchmark with all 99 queries Big SQL delivered over three times the performance of the nearest competitor in the single user test Big SQL was the only offering able to scale to 30 TB and run the full workload at that scale This result should not be surprising. IBM invented SQL after all and has over 30 years of experience building SQL query engines and optimizers. When it comes to SQL-on-Hadoop, IBM InfoSphere BigInsights with Big SQL clearly leads the herd. For More Information To learn more about Big SQL, download the free IBM whitepaper SQL-on-Hadoop without compromise at source=sw-infomgt&s_pkg=ov23626 To try IBM Big SQL for free, download IBM s free InfoSphere BigInsights QuickStart Edition, or run BigInsights in the cloud at
8 Which SQL Engine Leads the Herd? 8 Please note: Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. Copyright IBM Corporation 2014 IBM Canada 3600 Steeles Ave East Markham, Ontario L3R 9Z7 Produced in Canada October 2014 All Rights Reserved IBM, the IBM logo, ibm.com, BigInsights, Cognos, DB2, Informix, InfoSphere, PureData and z/os are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol ( or ), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at Copyright and trademark information at ibm.com/legal/copytrade.shtml TPC Benchmark, TPC-DS, and QphDS are trademarks of Transaction Processing Performance Council Cloudera, the Cloudera logo, Cloudera Impala are trademarks of Cloudera. Hortonworks is a trademark of Hortonworks Inc. Hadoop and Hive are trademarks of the Apache Software Foundation Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. UNIX is a registered trademark of The Open Group in the United States and other countries. Other company, product and service names may be trademarks or service marks of others. References in this publication to IBM products and services do not imply that IBM intends to make them available in all countries in which IBM operates. IMW14799-USEN-00
IBM BigInsights for Apache Hadoop
IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationThe IBM Cognos Platform for Enterprise Business Intelligence
The IBM Cognos Platform for Enterprise Business Intelligence Highlights Optimize performance with in-memory processing and architecture enhancements Maximize the benefits of deploying business analytics
More informationIBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems
IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity
More informationName: Srinivasan Govindaraj Title: Big Data Predictive Analytics
Name: Srinivasan Govindaraj Title: Big Data Predictive Analytics Please note the following IBM s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice
More informationIBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
More informationAtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
More informationIBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite
IBM InfoSphere Optim Test Data Management solution for Oracle E-Business Suite Streamline test-data management and deliver reliable application upgrades and enhancements Highlights Apply test-data management
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationBig Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide
Big Data Analytics with IBM Cognos BI Dynamic Query IBM Redbooks Solution Guide IBM Cognos Business Intelligence (BI) helps you make better and smarter business decisions faster. Advanced visualization
More informationIBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances
IBM Software Business Analytics Cognos Business Intelligence IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances 2 IBM Cognos 10: Enhancing query processing performance for
More informationIBM Cognos TM1. Enterprise planning, budgeting and analysis. Highlights. IBM Software Data Sheet
IBM Software IBM Cognos TM1 Enterprise planning, budgeting and analysis Highlights Reduces planning cycles by as much as 75% and reporting from days to minutes Owned and managed by Finance and lines of
More informationIBM InfoSphere Optim Test Data Management
IBM InfoSphere Optim Test Data Management Highlights Create referentially intact, right-sized test databases or data warehouses Automate test result comparisons to identify hidden errors and correct defects
More informationBig Data Strategies with IMS
Big Data Strategies with IMS #16103 Richard Tran IMS Development richtran@us.ibm.com Insert Custom Session QR if Desired. Agenda Big Data in an Information Driven economy Why start with System z IMS strategies
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationBlistering Fast SQL Access to Hadoop using. IBM BigInsights 3.0 with Big SQL 3.0
Blistering Fast SQL Access to Hadoop using IBM BigInsights 3.0 with Big SQL 3.0 SQL-over-Hadoop implementations are ready to execute OLAP complex query workloads at a fraction of the cost of traditional
More informationIBM Cognos Performance Management Solutions for Oracle
IBM Cognos Performance Management Solutions for Oracle Gain more value from your Oracle technology investments Highlights Deliver the power of predictive analytics across the organization Address diverse
More informationSplice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com
REPORT Splice Machine: SQL-on-Hadoop Evaluation Guide www.splicemachine.com The content of this evaluation guide, including the ideas and concepts contained within, are the property of Splice Machine,
More informationIBM InfoSphere BigInsights Enterprise Edition
IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade
More informationBig Data projects and use cases. Claus Samuelsen IBM Analytics, Europe csa@dk.ibm.com
Big projects and use cases Caus Samuesen IBM Anaytics, Europe csa@dk.ibm.com IBM Sofware Overview of BigInsights IBM BigInsights Scientist Free Quick Start (non production): IBM Open Patform BigInsights
More informationIBM Cognos Analysis for Microsoft Excel
IBM Software Group Data Sheet IBM Cognos Analysis for Microsoft Excel Highlights Explore and analyze trusted and secure BI data in a familiar spreadsheet format Develop high frequency and high priority
More informationBig Data and Market Surveillance. April 28, 2014
Big Data and Market Surveillance April 28, 2014 Copyright 2014 Scila AB. All rights reserved. Scila AB reserves the right to make changes to the information contained herein without prior notice. No part
More informationIBM InfoSphere Optim Test Data Management Solution
IBM InfoSphere Optim Test Data Management Solution Highlights Create referentially intact, right-sized test databases Automate test result comparisons to identify hidden errors Easily refresh and maintain
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationPROGRESS DATADIRECT QA AND PERFORMANCE TESTING EXTENSIVE TESTING ENSURES DATA CONNECTIVITY THAT WORKS
Progress DataDirect Connect DATA SHEET PROGRESS DATADIRECT QA AND PERFORMANCE TESTING EXTENSIVE TESTING ENSURES DATA CONNECTIVITY THAT WORKS Progress DataDirect ODBC, JDBC and ADO.NET data connectivity
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationActian SQL in Hadoop Buyer s Guide
Actian SQL in Hadoop Buyer s Guide Contents Introduction: Big Data and Hadoop... 3 SQL on Hadoop Benefits... 4 Approaches to SQL on Hadoop... 4 The Top 10 SQL in Hadoop Capabilities... 5 SQL in Hadoop
More informationIBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance
Data Sheet IBM Cognos 8 Business Intelligence Analysis Discover the factors driving business performance Overview Multidimensional analysis is a powerful means of extracting maximum value from your corporate
More informationControl-M for Hadoop. Technical Bulletin. www.bmc.com
Technical Bulletin Control-M for Hadoop Version 8.0.00 September 30, 2014 Tracking number: PACBD.8.0.00.004 BMC Software is announcing that Control-M for Hadoop now supports the following: Secured Hadoop
More informationIBM Cognos Business Intelligence Scorecarding
IBM Cognos Business Intelligence Scorecarding Successfully linking strategy to operations Overview Scorecarding offers a proven approach to communicating business strategy throughout the organization and
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationWA2192 Introduction to Big Data and NoSQL EVALUATION ONLY
WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationRocket AS v6.3. Benefits of upgrading
Rocket AS v6.3 Benefits of upgrading What is Rocket AS? Rocket AS for IBM System z provides query, reporting, data visualization and rapid application development for System z data including DB2. With
More informationFocus on the business, not the business of data warehousing!
Focus on the business, not the business of data warehousing! Adam M. Ronthal Technical Product Marketing and Strategy Big Data, Cloud, and Appliances @ARonthal 1 Disclaimer Copyright IBM Corporation 2014.
More informationBig Data and Apache Hadoop Adoption:
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
More informationHow To Write A Bigbench Benchmark For A Retailer
BigBench Overview Towards a Comprehensive End-to-End Benchmark for Big Data - bankmark UG (haftungsbeschränkt) 02/04/2015 @ SPEC RG Big Data The BigBench Proposal End to end benchmark Application level
More informationHow To Set Up An Ibm Marketing Management System
IBM Enterprise Marketing Management 9.1.2 Recommended Software Environments and Minimum System Requirements 9/23/2015 IBM Corporation Copyright Copyright IBM 2015 IBM Corporation B1WA LKG1 550 King Street
More informationApache Sentry. Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com
Apache Sentry Prasad Mujumdar prasadm@apache.org prasadm@cloudera.com Agenda Various aspects of data security Apache Sentry for authorization Key concepts of Apache Sentry Sentry features Sentry architecture
More informationORACLE DATA INTEGRATOR ENTERPRISE EDITION
ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION KEY FEATURES Out-of-box integration with databases, ERPs, CRMs, B2B systems, flat files, XML data, LDAP, JDBC, ODBC Knowledge
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics platform. PENTAHO PERFORMANCE ENGINEERING
More informationActian Vector in Hadoop
Actian Vector in Hadoop Industrialized, High-Performance SQL in Hadoop A Technical Overview Contents Introduction...3 Actian Vector in Hadoop - Uniquely Fast...5 Exploiting the CPU...5 Exploiting Single
More informationWHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
More informationSupported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x
HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 5/7/2014 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements
More informationIn-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
More informationIBM Software Top tips for securing big data environments
IBM Software Top tips for securing big data environments Why big data doesn t have to mean big security challenges 2 Top Comprehensive tips for securing data big protection data environments for physical,
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationand Hadoop Technology
SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute
More informationForecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014
Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/
More informationWhy Big Data in the Cloud?
Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data
More informationUbuntu and Hadoop: the perfect match
WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely
More informationHadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
More informationInformation management software solutions White paper. Powerful data warehousing performance with IBM Red Brick Warehouse
Information management software solutions White paper Powerful data warehousing performance with IBM Red Brick Warehouse April 2004 Page 1 Contents 1 Data warehousing for the masses 2 Single step load
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationIBM Content Integrator Enterprise Edition, Version 8.5.1
IBM Software Information Management IBM Content Integrator Enterprise Edition, Version 8.5.1 Highlights Enriches portals and key business applications with federated access to content stored in multiple
More informationSQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box)
SQL Server 2012 Gives You More Advanced Features (Out-Of-The-Box) SQL Server White Paper Published: January 2012 Applies to: SQL Server 2012 Summary: This paper explains the different ways in which databases
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationWell packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances
INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationHow To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
More informationIntroducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
More informationOracle Big Data Strategy Simplified Infrastrcuture
Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly
More informationBig Data for the Rest of Us Technical White Paper
Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationVectorwise 3.0 Fast Answers from Hadoop. Technical white paper
Vectorwise 3.0 Fast Answers from Hadoop Technical white paper 1 Contents Executive Overview 2 Introduction 2 Analyzing Big Data 3 Vectorwise and Hadoop Environments 4 Vectorwise Hadoop Connector 4 Performance
More informationAn Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database
An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct
More informationIBM Tivoli Storage FlashCopy Manager
IBM Storage FlashCopy Manager Online, near-instant snapshot backup and restore of critical business applications Highlights Perform near-instant application-aware snapshot backup and restore, with minimal
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationEinsatzfelder von IBM PureData Systems und Ihre Vorteile.
Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics
More informationDiscovering Business Insights in Big Data Using SQL-MapReduce
Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013
More informationUser Pass-Through Authentication in IBM Cognos 8 (SSO to data sources)
User Pass-Through Authentication in IBM Cognos 8 (SSO to data sources) Nature of Document: Guideline Product(s): IBM Cognos 8 BI Area of Interest: Security Version: 1.2 2 Copyright and Trademarks Licensed
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationScenario 2: Cognos SQL and Native SQL.
Proven Practice Scenario 2: Cognos SQL and Native SQL. Product(s): IBM Cognos ReportNet and IBM Cognos 8 Area of Interest: Performance Scenario 2: Cognos SQL and Native SQL. 2 Copyright Copyright 2008
More informationIBM InfoSphere Optim Data Masking solution
IBM InfoSphere Optim Data Masking solution Mask data on demand to protect privacy across the enterprise Highlights: Safeguard personally identifiable information, trade secrets, financials and other sensitive
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationIBM WebSphere Application Server Family
IBM IBM Family Providing the right application foundation to meet your business needs Highlights Build a strong foundation and reduce costs with the right application server for your business needs Increase
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationOracle Big Data Management System
Oracle Big Data Management System A Statement of Direction for Big Data and Data Warehousing Platforms O R A C L E S T A T E M E N T O F D I R E C T I O N A P R I L 2 0 1 5 Disclaimer The following is
More informationHow to Deliver Measurable Business Value with the Enterprise CMDB
How to Deliver Measurable Business Value with the Enterprise CMDB James Moore jdmoore@us.ibm.com Product Manager, Business Service, Netcool/Impact 2010 IBM Corporation Agenda What is a CMDB? What are CMDB
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationOPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT
WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve
More informationHow to Choose Between Hadoop, NoSQL and RDBMS
How to Choose Between Hadoop, NoSQL and RDBMS Keywords: Jean-Pierre Dijcks Oracle Redwood City, CA, USA Big Data, Hadoop, NoSQL Database, Relational Database, SQL, Security, Performance Introduction A
More informationTeradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
More informationThe New Promise of Business Intelligence
IBM Software Group White Paper Business Analytics The New Promise of Business Intelligence 2 The New Promise of Business Intelligence Introduction: A lot has changed since business intelligence first came
More informationNext-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
More informationDelivering Real-World Total Cost of Ownership and Operational Benefits
Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationMicrosoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value
Microsoft s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Published by: Value Prism Consulting Sponsored by: Microsoft Corporation Publish date: March 2013 Abstract: Data
More informationFederated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov
Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationCopyright 2012, Oracle and/or its affiliates. All rights reserved.
1 Oracle Big Data Appliance Releases 2.5 and 3.0 Ralf Lange Global ISV & OEM Sales Agenda Quick Overview on BDA and its Positioning Product Details and Updates Security and Encryption New Hadoop Versions
More informationORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION
ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More information