INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com The Big Data space is rapidly evolving. The first wave of adoption involved Webbased companies such as online retailers, service providers, and social media firms. These companies adopted open source technologies such as Apache Hadoop and used considerable in-house technical expertise to build business solutions on top of these open source foundations. The second wave will involve businesses that both lack technical teams of the same size and depth as the Web-based companies and are averse to the risk and cost associated with large investments in original software development. These businesses will be attracted to finished products from established companies that offer short paths to business analytic solutions using Big Data technologies. Oracle is seeking to appeal to such firms with: Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances Products offered in a way that enables users to integrate them into their existing Oracle Database and Fusion Middleware environment Technologies that include the Big Data capabilities in highest demand, including Hadoop, support for the R language, and scalable in-memory database functionality (IMDB) IN THIS INSIGHT This IDC Insight considers a number of key product announcements made by Oracle in January and February 2012 as well as their role in the company's strategy with respect to Big Data and their likely impact on the software markets associated with Big Data technology. The most recent announcement concerns Oracle Advanced Analytics, an option of Oracle Database 11g. This announcement aligns strategically with the following three product announcements that establish comprehensive Oracle coverage of the Big Data space: Oracle Exalytics In-Memory Machine Oracle Big Data Appliance Oracle TimesTen In-Memory Database 11g Release 2 Filing Information: February 2012, IDC #233348, Volume: 1, Tab: Vendors Database Management and Data Integration Software: Insight
Taken together, these products address three key Big Data areas: advanced and large-scale analytics, Hadoop-based data classification and extraction, and scalable in-memory database (IMDB) technology. SITUATION OVERVIEW Highlights On February 8, 2012, Oracle announced general availability of Oracle Advanced Analytics. This option of Oracle Database 11g Enterprise Edition includes Oracle Data Mining and a new component called Oracle R Enterprise, which embeds R analytic capability in the database server. Previously, Oracle announced the Oracle Exalytics In-Memory Machine and the Oracle Big Data Appliance at Oracle OpenWorld 2011. In mid-january 2012, the company announced pricing and general availability for these two products plus a greatly enhanced version of the in-memory relational database management system (RDBMS), Oracle TimesTen. Taken together, this database option and these three products address key areas of the Big Data space and represent a significant move by Oracle to establish itself as a major Big Data player. IDC identifies three key areas of Big Data as: Large-scale advanced analytics Hadoop-driven Big Data processing Scalable in-memory database management This combination represents a comprehensive approach to the Big Data problem space. This Insight considers each area in turn, focusing on how Oracle is addressing it. Analysis Oracle describes its approach to the Big Data space as encompassing four key stages: Acquire: Collect, ingest, and format data for analysis Organize: Put data into an order that supports either deep analysis or integration into a larger structured data collection, such as a data warehouse Analyze: Perform either standard query-based/online analytical processing (OLAP) analysis or deep statistical analysis on the resulting data set Decide: Yield results that can drive both tactical and strategic business decisions The Oracle Big Data Appliance takes the user from the Acquire to the Organize stage, the Oracle Exadata Database Machine (or other Oracle Database 11g Enterprise Edition installation) with the Oracle Advanced Analytics option takes the user from the Organize to the Analyze stage, and the Oracle Exalytics In-Memory Machine takes the user from the Analyze to the Decide stage. 2 #233348 2012 IDC
These products (note that Oracle Exalytics In-Memory Machine includes Oracle TimesTen) fall into the three functional areas described in this Insight as key elements of the Big Data space. Large-Scale Advanced Analytics This functional area includes the ability to accumulate large amounts of data in a scalable space for high-performance deep analysis. Oracle is addressing this area with two product offerings: Oracle Advanced Analytics is an Oracle Database 11g Enterprise Edition option that includes Oracle Data Mining and Oracle R Enterprise for those that wish to perform deep data mining and analytics driven by the R language, with those analytics executing in the database engine. Oracle Exalytics In-Memory Machine is for those seeking an engineered system that is preconfigured to support classic online analytical processing using in-memory cubes powered by Oracle Essbase, or relational data held in memory by Oracle TimesTen for fast execution. (Note that Exalytics can support large data sets that extend beyond the main memory capacity of the system by sending SQL queries to a back-end database such as Oracle running on Exadata.) Oracle Advanced Analytics This option of Oracle Database 11g Enterprise Edition has two components: Oracle Data Mining and Oracle R Enterprise. The former is an upgraded version of the data mining option that Oracle has offered for a number of years. The latter is a capability embedded in the database engine that allows the user to build R analytics that execute in the database close to the data for better performance. The system allows R users to access table data within the database using the familiar variables and other constructs of the R language. Data retrieval, statistical and predictive analysis operations, and advanced numerical computations expressed in R are converted into SQL and executed under the covers, so the R programmer does not need to have expertise in relational database technology or the structure of the database in question. The role of this option is to allow "quants" that prefer to use R as their means of doing deep analytics to use that language in a high-performance way directly against the database data rather than as an external facility that requires considerable configuration to set up. It should be noted that Oracle Advanced Analytics is a database option and so can be used with any installation of Oracle Database 11g Enterprise Edition. This also means that it can be used within the Oracle Exadata Database Machine. When Oracle Advanced Analytics is used with the Oracle Real Application Clusters (RAC) option of Oracle Database, or within the Oracle Exadata Database Machine (which includes RAC), the user also takes advantage of the scalability of parallel SQL execution, which IDC also considers a key Big Data characteristic for relational database. 2012 IDC #233348 3
Exalytics In-Memory Machine This product is used to perform deep analysis of large amounts of business intelligence (BI) data quickly. It combines Oracle Business Intelligence Enterprise Edition (OBIEE) with enhanced visualization capabilities and performance optimizations, an optimized version of Oracle TimesTen In-Memory Database with analytic extensions, and an optimized version of Oracle Essbase for analyzing OLAP cubes in memory. It is delivered as an engineered system, with the hardware configured specifically for the Oracle TimesTen In-Memory Database for Exalytics and Oracle Business Intelligence Foundation software, which includes Oracle Business Intelligence Enterprise Edition and Oracle Essbase. The idea, as with all Oracle's engineered systems, is to deliver a product that can be set up and used with a minimum of effort, involving virtually no installation and only the tuning and configuration necessary for the specific analysis required by the user. Other products that feature IMDB functionality with analytics require considerable installation and configuration before use. Hadoop-Driven Big Data Processing This is the most mature of the new technology areas in the Big Data space. It involves the ability to accept either complex, heterogeneous (or unstructured) data or highvolume streams of machine-generated data; analyze the data for elements of value or for meaningful patterns; and provide analytical results or structured output, or both, generally leading to further analysis. This capability is generally addressed using the MapReduce paradigm, and the most common form of that paradigm is the open source Apache Hadoop set of technology. Oracle Big Data Appliance Oracle Big Data Appliance is an engineered system that provides a preconfigured installation of Cloudera's distribution that includes Apache Hadoop and associated project software. Oracle provides frontline support for this software, with back-end support from Cloudera, and enables the user to choose between standard Hadoop HDFS-based HBase and the Oracle NoSQL Database (developed from Berkeley DB) as the data management engine for query and analysis. (It should be noted that Oracle is among a number of vendors offering faster, more flexible alternatives to HBase for Hadoop users.) Hadoop applications can be integrated into Oracle environment using the Oracle Big Data Connectors (a package that includes optimized integration into the database), Oracle Loader for Hadoop, Oracle Data Integration Application Adapter for Hadoop, Oracle R Connector for Hadoop, and Oracle Direct Connector for HDFS. The Hadoop installation is a full Cloudera distribution that includes Cloudera Manager, all fully supported by Oracle, with Cloudera providing level 2 and 3 support. It also includes an open source distribution of R and the Oracle NoSQL Database Community Edition. All are packaged in an appliance format on a machine with 216 CPU cores and 864GB of RAM, with 648TB of raw disk storage, and internally connected via an internal 40Gbps InfiniBand network. 4 #233348 2012 IDC
Scalable IMDB Management It is well understood that in-memory data management yields orders of magnitude better performance than any disk-based alternative. The Big Data dimension of this approach, and the one that really sets up IMDB as the future of database management generally, is the use of clustered servers on high-speed network with peer-to-peer background replication to deliver nearly limitless scalability with solid recoverability. A number of IMDB technologies have been moving in this direction for a while, though most were nonrelational. Oracle TimesTen 11g Release 2 The sleeper announcement of the year may be that of Oracle TimesTen 11g Release 2, which includes a scalable cache grid for in-memory relational database management that can scale to a larger size than can be supported in a single server's main memory space while retaining the high-performance characteristics of memorybased data management. Currently, such scaling can be accomplished by deployment within the Oracle Exalogic machine and using its built-in high-speed network that can support up to eight nodes. Further scaling can be achieved by linking multiple Oracle Exalogic machines together with InfiniBand connections. This configuration is normally applied to the use of Oracle TimesTen as a cache for Oracle Database and so is called the TimesTen In-Memory Database Cache Grid. Logically, however, it could be used as a standalone database with a similar configuration, either within Exalogic or on user-configured hardware. Recoverability is assured by transaction replication from the executing server to standby or subscribing servers. Further recoverability with reduced latency is achieved by the writing of parallel logs. Oracle TimesTen can be optimized for either OLTP or analytic workloads. The analytic workload optimization includes columnar data management. When used as a cache for Oracle Database, TimesTen can be configured for either read/write caching with parallel replication of transactions and parallel write-through to the database or read-only caching with multistream refresh of transactions from the database and parallel replication of the refresh transactions to standby nodes. As was previously mentioned, TimesTen is also the in-memory RDBMS component of the Exalytics In- Memory Machine. Competitive Landscape Oracle's comprehensiveness in approaching both the Big Data landscape overall and how the products fit together represents a formidable challenge to any vendor hoping to offer end-to-end business-oriented Big Data solutions. There are, however, clear competitors in each of the Big Data areas. FUTURE OUTLOOK Big Data is a fast-moving space, and it is reasonable to expect that various combinations of products, old and new, will form to challenge Oracle in each of the Big Data areas described in this Insight. Some will be narrow, deep technologies that perform certain analytic functions very well. Others will be broad based. Oracle's approach, based on both software functionality and Oracle's engineered systems 2012 IDC #233348 5
strategy, can become well entrenched in user sites, however, as long as Oracle strives to move forward with these technologies. ESSENTIAL GUIDANCE Actions to Consider The Big Data space remains bewildering both for those in the business of making technical solutions and for users of those solutions. Some things to consider going forward are discussed in the sections that follow. Advice for Buyers Big Data is a fast-moving space, and approaches that seem "standard" may not be so tomorrow. Oracle's products offer a variety of approaches to Big Data management and analysis. This offers options, but one should regard the purchase of an engineered system or appliance as an investment in the future, not just a short-term solution. So, buyers should be circumspect and work out their long-range plans for the proper exploitation of Big Data for the foreseeable future before making significant commitments. Advice for Other Vendors Oracle's Big Data offerings are well packaged and fairly complete. Competing vendors must first decide if they want to concentrate on certain Big Data analytic or management problems, or if they want to compete on a level of breadth similar to that of Oracle. If they choose the latter, they should seek to be as comprehensive, either on their own or through partners, and to seek details regarding the Oracle products that represent opportunities to win through differentiation. Copyright Notice This IDC research document was published as part of an IDC continuous intelligence service, providing written research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or sales@idc.com for information on applying the price of this document toward the purchase of an IDC service or for information on additional copies or Web rights. Copyright 2012 IDC. Reproduction is forbidden unless authorized. All rights reserved. 6 #233348 2012 IDC