Business Case for Enterprise Big Data Deployments

Size: px
Start display at page:

Download "Business Case for Enterprise Big Data Deployments"

Transcription

1 August 2013 MANAGEMENT BRIEF Business Case for Enterprise Big Data Deployments Comparing Costs, Benefits and Risks for Use of IBM InfoSphere BigInsights and Open Source Apache Hadoop International Technology Group 609 Pacific Avenue, Suite 102 Santa Cruz, California Telephone: Website: ITGforInfo.com

2 TABLE OF CONTENTS EXECUTIVE SUMMARY 1 Challenges and Solutions 1 Open Source 2 IBM InfoSphere BigInsights Differentiators 4 Conclusions 6 SOLUTION SET 8 Overview 8 Deployment Options 8 Servers and Storage 8 Platform Symphony 8 GPFS-FPO 8 DETAILED DATA 11 Composite Profiles 11 Cost Calculations 12 Cost Breakdowns 12 List of Figures 1. Three-year Costs for Use of IBM InfoSphere BigInsights and Open Source Apache Hadoop for Major Applications Averages for All Installations 1 2. IBM InfoSphere BigInsights Environment 4 3. IBM InfoSphere BigInsights Components 9 4. Composite Profiles FTE Salary Assumptions Three-year Cost Breakdowns 12 Copyright 2013 by the International Technology Group. All rights reserved. Material, in whole or part, contained in this document may not be reproduced or distributed by any means or in any form, including original, without the prior written permission of the International Technology Group (ITG). Information has been obtained from sources assumed to be reliable and reflects conclusions at the time. This document was developed with International Business Machines Corporation (IBM) funding. Although the document may utilize publicly available material from various sources, including IBM, it does not necessarily reflect the positions of such sources on the issues addressed in this document. Material contained and conclusions presented in this document are subject to change without notice. All warranties as to the accuracy, completeness or adequacy of such material are disclaimed. There shall be no liability for errors, omissions or inadequacies in the material contained in this document or for interpretations thereof. Trademarks included in this document are the property of their respective owners.

3 EXECUTIVE SUMMARY Challenges and Solutions What more can be said on the subject of big data? A great deal, it turns out. Industry debate tends to focus on the role which big data analytics may play in transforming business decision-making and organizational competitiveness. The impact on these will clearly be transformative. But there is a downside. Bottlenecks are emerging that may seriously delay realization of the potential of big data in many, perhaps most organizations. This is particularly the case for the complex of technologies that has developed around Apache Hadoop. As use of Hadoop-based systems has spread beyond social media companies, users have found that developer productivity is often poor Hadoop requires a great deal of manual coding and that skills shortages slow new project starts and magnify deployment times and costs. Hadoop specialists have become among the highest-paid in the IT world. In the United States, for example, starting salaries for Hadoop developers are routinely over $100,000, and salaries for managers, data scientists, architects and other high-level specializations often top $200,000. Worldwide, Hadoop compensation is trending rapidly upward, and is expected to continue doing so for the foreseeable future. The combination of low developer productivity and high salary levels makes for poor economics. This is, it is commonly argued, counterbalanced by the fact that most Hadoop components are open sourced, and may be downloaded free of charge. But overall costs may not necessarily be lower than for vendormanaged Hadoop distributions that enable more cost-effective development and deployment. This may be illustrated by comparisons of three-year costs for use of open source Hadoop and the IBM BigInsights Hadoop distribution for representative high-impact applications in six companies. Overall costs averaged 28 percent less for use of IBM BigInsights. These comparisons, whose results are summarized in figure 1, include software licenses, support and personnel costs for use of BigInsights, and personnel costs only for use of open source Hadoop software. IBM InfoSphere BigInsights 3,386.9 Open Source Apache Hadoop 4,690.4 $ thousands Licenses & support personnel Ongoing personnel Figure 1: Three-year Costs for Use of IBM InfoSphere BigInsights and Open Source Apache Hadoop for Major Applications Averages for All Installations Personnel costs are for initial application development and deployment, as well as for ongoing postproduction operations over a three-year period. Calculations include data scientists, architects, project managers, developers, and data and installation specialists for initial development and deployment; and developers, data specialists and system administrators for post-production operations. BigInsights costs also include licenses and support. International Technology Group 1

4 Comparisons are for composite profiles of financial services, health care, marketing services, media, retail and telecommunications companies. Profiles were constructed based on information supplied by 29 organizations employing BigInsights, open source tools or combinations of these. Further information on profiles, methodology and assumptions employed for calculations, along with cost breakdowns for individual companies may be found in the Detailed Data section of this report. Open Source Hadoop, in its open source form, is based largely on technologies developed by Google in the early 2000s. The earliest and, to date, largest Hadoop users have been social media and e-commerce companies. In addition to Google itself, these have included Amazon.com, AOL, ebay, Facebook, LinkedIn, Twitter, Yahoo and international equivalents. Although the field of players has since expanded to include hundreds of venture capital-funded start-ups, along with established systems and services vendors and large end users, social media businesses continue to control Hadoop. Most of the more than one billion lines of code more than 90 percent, according to some estimates in the Apache Hadoop stack has to date been contributed by these. The priorities of this group have inevitably influenced Hadoop evolution. There tends to be an assumption that Hadoop developers are highly skilled, capable of working with raw open source code and configuring software components on a case-by-case basis as needs change. Manual coding is the norm. Decades of experience have shown that, regardless of which technologies are employed, manual coding offers lower developer productivity and greater potential for errors than more sophisticated techniques. Continuous updates may be required as business needs and data sources change. Over time, complex, inadequately documented masses of spaghetti code may be generated that are expensive to maintain and enhance. These problems have routinely affected organizations employing legacy mainframe applications. There is no sense in repeating them with new generations of software technology. Other issues have also emerged. These include: Stability. Hadoop and the open source software stack surrounding it are currently defined and enhanced through at least 25 separate Apache Foundation projects, sub-projects and incubators. More can be expected as the scope of the Hadoop environment expands, and new tools and technologies emerge. Initiatives tend to move at different speeds, and release dates are at best loosely coordinated. Developers are exposed to a continuous stream of changes. Instability can be a significant challenge. It becomes more difficult to plan technology strategies, project schedules and costs become less predictable, and risks of project failure increase. The potential for future interoperability problems also expands. The Apache stack may, moreover, evolve in an unpredictable manner. Organizations may standardize upon individual components only to find that these receive declining attention over time. The pace of technology change among social media companies is a great deal faster than users in most other industries are accustomed to. Interoperability. Few, if any Hadoop-based systems are standalone in the sense that they do not require interoperability with other applications and databases. International Technology Group 2

5 All of the organizations surveyed for this report, for example, employed or planned to employ interfaces to relational databases, data warehouses, conventional analytics tools, query and reporting intranets and/or CRM and back-end systems. This was the case even for pure play suppliers of Hadoop-based services. Interoperability requirements were particularly significant among financial services, health care, insurance, retail and telecommunications companies. One large banking institution reported, for example, that it expected to implement 40 to 50 different interfaces before its Hadoop-based system could be brought into full operation. Resiliency. The open source Hadoop stack includes a variety of mechanisms designed to maintain availability, and enable failover and recovery in the event of unplanned (i.e., accidental) outages as well as planned downtime for software modifications, scheduled maintenance and other tasks. These mechanisms are, however, a great deal less mature than is the case for conventional business-critical systems. They are also, in environments characterized by numerous handconfigured components, a great deal more complex and error-prone. Vulnerabilities are magnified when systems undergo frequent changes. Major social media companies have often realized high levels of availability. This has, however, typically required expensive investments to harden software, ensure redundancy and provide indepth operational monitoring and response staff and procedures. Manageability. Open source Hadoop limitations have emerged in such areas as configuration and installation, monitoring, job scheduling, workload management, tuning and availability and security administration. Although some open source components address these issues, they are a comparatively low priority for most Apache contributors. Users may, to some extent, compensate for these limitations by labor-intensive management practices. This approach not only translates into higher personnel costs, but is also less reliable. Open source manageability limitations may not be visible during application development and deployment. However, they will be reflected in higher ongoing full time equivalent (FTE) system administration staffing and may impact post-production quality of service. Support. Open source software is available only with community support i.e., users rely upon online peer forums for enhancements, technical advice and problem resolution. This approach may prove appropriate for commonly encountered issues, although it is dependent on the willingness of others to share their time and experience. It has proved to be a great deal less reliable in dealing with organization-specific configuration issues. The bottom-line implications may be substantial. Delays in resolving problems may undermine developer productivity, and may result in application errors, performance bottlenecks, outages, data loss and other negative effects. As Hadoop deployments have grown, these issues have led to the appearance of vendor-managed, feebased distributions that include enhanced tools and functions, and offer more effective customer support. Current examples include Amazon Elastic MapReduce (Amazon EMR) web service; Cloudera s Distribution Including Apache Hadoop (CDH); EMC s Pivotal HD; Hortonworks Data Platform; IBM InfoSphere BigInsights; Intel Distribution for Apache Hadoop (Intel Distribution); and MapR M series. International Technology Group 3

6 IBM InfoSphere BigInsights Differentiators The BigInsights environment currently includes the components shown in figure 2. APPLICATIONS 20+ prebuilt applications VISUALIZATION & DISCOVERY ADMINISTRATION BigSheets Dashboard & Visualization Admin Console Monitoring ENABLERS Social Data Analytics Accelerator Machine Data Analytics Accelerator Eclipse- based toolkits REST API Web application catalog Big SQL Application framework INFRASTRUCTURE Avro HBase Hive Jaql Lucene Oozie Pig ZooKeeper MapReduce HDFS BigIndex Flexible Scheduler Splittable compression Enhanced security Integrated Installer Adaptive MapReduce GPFS- FPO High availability DATA SOURCES & CONNECTIVITY BoardReader Web Crawler Flume R Sqoop JDBC ODBC Cognos SPSS MicroStrategy SAS DB2 Oracle SQL Server Teradata Platform Computing InfoSphere Data Explorer InfoSphere DataStage InfoSphere Optim InfoSphere Warehouse IBM PureData System for Analytics InfoSphere Streams InfoSphere Guardium IBM Open Source Figure 2: IBM InfoSphere BigInsights Environment International Technology Group 4

7 Although this environment includes a full Apache Hadoop stack, it is differentiated by numerous IBM components that address the issues outlined above. In BigInsights Version 2.1, which became available in June 2013, these may be summarized as follows: Visualization and discovery tools include IBM BigSheets, a highly customizable end user analytical solution for identification, integration and exploration of unstructured and/or structured data patterns. It employs a spreadsheet-like interface, but is more sophisticated than conventional spreadsheets, and is not limited in the amount of data it can address. Administration tools include a Web-based administrative console providing a common, highproductivity interface for monitoring, health checking and management of all application and infrastructure components. Integrated Installer automates configuration and installation tasks for all components. Development tools include Eclipse-based toolkits supporting the principal Hadoop development tools and languages, as well as a Web application catalog that includes ad hoc query, data import and export, and test applications designed for rapid prototyping. Accelerators for social media and machine data analytics include prebuilt templates and components for a range of industry- and application-specific functions. Accelerators were developed based on customer experiences, and have materially improved time to value for development and deployment of Hadoop-based applications. More can be expected in the future. Text analytic capabilities are incorporated as a standard feature of BigInsights. The social media and machine accelerators include custom text extractors for their respective application domains. Big SQL, introduced in BigInsights 2.1, is a native SQL query engine. It allows developers to leverage existing SQL skills tools to query Hive, HBase or distributed file system data. Developers may use standard SQL syntax and, in some cases, IBM-supplied SQL extensions optimized for use with Hadoop. Big SQL offers an alternative to the SQL-like HiveQL, a Hive extension developed by Facebook. Big SQL is easier to use and better aligned with mainstream SQL development tools and techniques. It also incorporates features which are not found in native HiveQL that can improve runtime performance for certain applications and workloads. This approach is likely to see widespread adoption. While skilled Hadoop specialists are still comparatively rare, SQL has been in widespread use since the 1980s. There are believed to be over four million developers worldwide familiar with this language. Most large organizations have longstanding investments in SQL skill sets, and SQL-based applications and tools. Infrastructure enhancements are provided in such areas as large-scale indexing (BigIndex), job scheduling (BigInsights Scheduler), administration and monitoring tools, splittable text compression and security. BigInsights supports Adaptive MapReduce, which exploits IBM workload management technology in Platform Symphony. Adaptive MapReduce allows smaller MapReduce jobs to be executed more efficiently, and enables more effective, lower-overhead management of mixed workloads than open source MapReduce. Platform Symphony is a high-performance grid middleware solution originally developed by Platform Computing, which was acquired by IBM in In BigInsights, it can be used to replace the open source MapReduce layer while allowing MapReduce jobs to be created in the same manner. Customers may choose which to install. International Technology Group 5

8 IBM General Parallel File System File Placement Optimizer (GPFS-FPO is a Hadoopoptimized implementation of the IBM GPFS distributed file system that offers an alternative to HDFS. For more than a decade, GPFS has been widely deployed for scientific and technical computing, as well as a wide range of commercial applications. In addition to offering higher performance, GPFS enables higher cluster availability, and benefits from more effective system management, snapshot copying, failover and recovery, and security than HDFS. (IBM is not alone in adopting this approach. There has been a growing trend among Hadoop users toward use of HDFS alternatives such as MapR file system, Cassandra and Lustre.) High availability features include enhanced HDFS NameNode failover. The IBM implementation enables seamless and transparent failover. The process is automatic no administrator intervention is required and occurs more rapidly and reliably than in a conventional open source environment. More sophisticated features are offered by Platform Symphony. Interoperability tools conform to a wide range of industry standards and/or are designed to integrate with key IBM and third-party databases and application solutions. Interfaces are provided to commonly used open source software; JDBC- and ODBC-compliant tools; IBM DB2, Oracle, Microsoft SQL Server and Teradata databases; and key IBM solutions forming part of the company s Big Data Platform. These include the InfoSphere Information Warehouse data warehouse framework; Cognos business intelligence; SPSS statistical modeling and analysis; InfoSphere DataStage extract, transformation and load (ETL) tooling; InfoSphere Guardium for enterprise security management; Platform Symphony high-performance grid middleware; and the IBM PureData System for Analytics appliance. Web Crawler application automates Internet searches and collects data based on user-defined criteria. Data may be imported into BigSheets. BigInsights is compatible with, and is often used alongside IBM InfoSphere Streams for real-time big data analytics. This solution is architecturally comparable to open source Storm, but contains numerous IBM enhancements for development productivity, manageability, resiliency and interoperability. BigInsights contains a limited-use InfoSphere Streams license. BigInsights capabilities are evolving rapidly. IBM has committed to integrating new open source components as these emerge, and the company is known to be working on a variety of other functional enhancements. Conclusions Use of Hadoop is still at an early stage. Apart from a handful of major social media companies, most Hadoop deployments have occurred over the last two years. As industry surveys have shown, many are still not in production. Adoption, however, is expanding rapidly, and it is clear that Big Data will become a central feature of IT landscapes in most organizations. As this occurs, technology stacks and deployment patterns will inevitably change. It can be expected that, as with previous waves of open source technology, the Hadoop market will become more segmented, and solution offerings will become more diverse. Enterprise users a category that will probably include many midsize businesses as well as start-ups will inevitably move to more productive, resilient, vendor-supported distributions. International Technology Group 6

9 Many organizations will also move toward converged Hadoop and SQL environments, applying SQL skill bases and application portfolios to new Big Data challenges. There is also a widespread move toward augmentation to SQL-based data warehouses with subsets or aggregations of Hadoop data. These trends will increasingly leverage broader IBM differentiators. These include long established company strengths in software engineering (BigInsights components are not only pre-integrated, but also extensively tested for optimum performance and functional transparency), customization (the ability of IBM services organizations to deliver industry- and organization-specific solutions has already emerged as a major source of BigInsights appeal) and customer support. IBM, moreover, has decades of experience with relational technology and data warehousing. The company s SQL strengths exceed by wide margins those of any other Hadoop distributor, and its systems integration capabilities are among the world s best. As in other areas of its software business, the company has moved aggressively to recruit and support business partners. These currently include more than 300 independent software vendor (ISV) and services firms, including suppliers of a wide range of complementary tools and industry-specific solutions. The number is expanding rapidly. The Hadoop open source community, no doubt, will remain vibrant, and use of free downloads will continue to expand. But a distinct category of enterprise solutions will clearly emerge, and that these will be more strongly focused on development productivity, stability, resilience, manageability, system integration and in-depth customer support. For organizations that expect to move toward the enterprise paradigm, it may make sense to deploy IBM BigInsights sooner rather than later. International Technology Group 7

10 SOLUTION SET Overview In its present form, BigInsights includes the principal components of the Apache Hadoop and related projects, along with IBM enhancements described earlier. BigInsights is offered by IBM as a licensed software product, and through IBM SmartCloud Enterprise and third-party cloud service providers. In addition to the flagship Enterprise Edition, which currently includes the components summarized in figure 3, IBM offers two free versions of BigInsights. Basic Edition includes the principal BigInsights open source components, along with database and Web server interfaces, and a simple management console. Quick Start Edition is a near full-function offering restricted to non-production use. It is designed to allow users to evaluate and gain experience with BigInsights enterprise features, and to prototype applications and develop proofs of concept. Deployment Options Servers and Storage IBM offers BigInsights clusters built around IBM System x3550 M4 and x3630 dual-socket x86 servers acting as management and data nodes respectively. Data nodes may be configured with Near Line SAS (NL-SAS) or SATA drives. Configurations are packaged in increments of up to 20, 20 to 50 and 50+ nodes. Red Hat Enterprise Linux (RHEL) and SUSE Linux Enterprise Server (SLES) are supported. BigInsights may also be deployed on non-ibm x86 servers, and on IBM Power Systems with RHEL or SLES. IBM or third-party arrays may be employed for external storage. Platform Symphony IBM offers the option of deploying BigInsights on Platform Symphony. With this approach, Platform Symphony job scheduling and management mechanisms substitute for those of MapReduce, and additional high availability features may be leveraged. Platform Symphony employs x86-based clusters to support applications extremely high levels of performance and scalability. In principle, configurations of 10,000 or more cores are supported. According to IBM, benchmark tests have demonstrated more than seven times higher performance than open source MapReduce for large-scale social media analytics workloads. Platform Computing is a longstanding player in HPC for scientific and technical computing, and for commercial applications in financial services, manufacturing, digital media, oil and gas, life sciences and other industries. GPFS-FPO GPFS-FPO has been deployed by a number of early BigInsights users in beta mode, and became generally available in Version 2.1. In HPC applications, GPFS has demonstrated near-linear scalability in extremely large configurations installations with more than 1,000 nodes are common, and the largest exceed 5,000 nodes. Storage volumes often run to hundreds of terabytes, and there are working petabyte-scale systems. User experiences, as well as tests run with a variety of HPC benchmarks, have demonstrated significantly higher performance in some cases by more than 20 times than HDFS. GPFS also incorporates a distributed metadata structure, policy-driven automated storage tiering, managed high-speed replication, and information lifecycle management (ILM) tooling. International Technology Group 8

11 APPLICATION DEVELOPMENT Social Data Analytics Accelerator Machine Data Analytics Accelerator BigSheets Big SQL Web application catalog INFRASTRUCTURE Avro HBase Hive Jaql Lucene Oozie Pig MapReduce Hadoop Distributed File System (HDFS) BigIndex BigInsights Scheduler Splittable compression Enhanced security Integrated Installer Adaptive MapReduce GPFS File Placement Optimizer (GPFS- FPO) Application suite enabling extract of social media data, construction of user profiles & association with sentiment, buzz, intent & ownership. Includes customizable tools for brand management, lead generation & other common functions. Pre- integrated options for use with IBM (ex- Unica) Campaign & CCI solutions Application suite enabling import & aggregation of structured, semi- structured and/or unstructured data from log files, meters, sensors, readers & other machine sources. Provides assists for text, faceted & timeline- based searches, pattern recognition root cause analysis, chained analysis & other functions Spreadsheet- like tool for identification, integration & analysis of large volumes of unstructured &/or structured data. Incorporates IBM- developed analytics macros & pattern recognition technology. Highly customizable for individual user requirements Native SQL query engine allows developers to query Hive, HBase or distributed file system data using standard SQL syntax & Hadoop- optimized SQL extensions. Allows administrators to populate Big SQL tables with data from multiple sources. JDBC & ODBC drivers support many existing SQL query tools Includes sample query, data import & export, & test tools designed for proof of concept application deployment Data serialization & remote procedure call (RPC) framework defines JSON data schemas NoSQL (nonrelational) database incorporating row- & column- based table structures. Based on Google BigTable technology Facilitates data extraction, transformation & loading (ETL), & analysis of large HDFS data sets High- level declarative query & scripting language with JSON- based data model & SQL- like interface processes structured unstructured data. Originally developed by IBM Text search engine library describes job graphs & relationships between these Workflow scheduler for Hadoop job management Platform for analyzing large data sets includes high- level language for expressing, & infrastructure for evaluating programs Parallel programming model for Hadoop clusters Hadoop distributed file system supports clusters built around x86- based NameNode (master) & DataNodes. Closely integrated with MapReduce Implements Hadoop- based indexing as a native InfoSphere BigInsights capability; enables additional complex functions including distributed indexing & faceted search Extension of Hadoop Fair Scheduler enables policy- based scheduling of MapReduce jobs Expanded implementation of Apache Lempel- Ziv- Oberhumer (LZO) algorithm allowing compressed data to run jobs on multiple mappers Includes enhanced authentication, authorization (roles) & auditing functions. Interfaces to IBM InfoSphere Guardium solutions GUI- driven tool allows rapid, automated configuration, installation & assurance of BigInsights clusters. Guided installation features facilitate administrator tasks Platform Symphony technology that accelerates processing of small MapReduce jobs & enables more effective execution of mixed Hadoop workloads Extension of IBM General Parallel File System high- performance distributed file system optimized for use in Hadoop clusters Figure 3: IBM InfoSphere BigInsights Components International Technology Group 9

12 DATA SOURCES & CONNECTIVITY BoardReader Web Crawler Flume R Sqoop JDBC ODBC MicroStrategy, SAS Database interfaces IBM data exchanges Legend: IBM Open Source Interface to BoardReader search engine enables query access, & data download & import to BigInsights file system Interface to IBM Web Crawler application for Internet data collection & organization Facilitates aggregation & integration of large data volumes across Hadoop clusters Enables integration of applications written in R statistics language Enables import & export of data between SQL & Hadoop databases Standard Java Database Connectivity interface to DBMS Standard Open Database Connectivity interface to DBMS Interface to widely- used third- party analytics tools Interfaces to IBM DB2, Oracle Database, Microsoft SQL Server & Teradata Database Enable exchange of BigInsights data with IBM Cognos Business Intelligence, InfoSphere DataStage ETL tools, InfoSphere Warehouse data warehouse framework, Platform Symphony grid middleware, IBM PureData System for Analytics, SPSS statistical modeling & analysis & InfoSphere Streams real- time analytics solutions Figure 3 (cont.): IBM InfoSphere BigInsights Components International Technology Group 10

13 DETAILED DATA Composite Profiles The calculations presented in this report are based upon the six composite profiles shown in figure 4. FTEs refers to numbers of full time equivalent database administrators. Health Care Company Financial Services Company Retail Company Applications Health care insurance provider claims analysis for quality of care recommenda- tions & cost/profitability variables 80 TB disk storage IBM INFOSPHERE BIGINSIGHTS FTEs (6 months): 5.25 Post- production operations: 2.95 OPEN SOURCE FTEs (6 months): 8.25 Post- production operations: 4.3 Diversified retail bank customer sentiment analysis of social media, correspondence & transaction records for loyalty program opt. Data warehouse interface 130 TB disk storage (8 months): 7.5 Post- production operations: 3.15 (10 months): Post- production operations: 6.0 Comparative analysis of customer online & in- store purchasing behavior. Sources include web logs, point of sale & other data. Predictive analysis for merchandis- ing applications. Data warehouse & decision support interfaces 200 TB disk storage (12 months): 11.3 Post- production operations: 4.75 (15 months): 17.0 Post- production operations: 8.5 Applications Media Company Marketing Services Company Telecommunications Company Analysis of web logs for multiple properties to determine usage patterns, customer profiling, tracking ad event activity & identifying new marketing opportunities 300 TB disk storage IBM INFOSPHERE BIGINSIGHTS FTEs (7 months): 8.4 Post- production operations: 3.25 OPEN SOURCE FTEs (9 months): Post- production operations: 5.0 Figure 4: Composite Profiles Analysis of customer e- mail traffic for demographic & sentiment tracking, campaign management & other applications 350 TB disk storage (6 months): 7.55 Post- production operations: 2.6 (8 months): Post- production operations: 4.5 Analysis of call detail records (CDRs), Internet & social media activity to identify cross- sell opportunities & improve loyalty program effectiveness. Interface to CIS, data warehouse & operational systems 500 TB disk storage (9 months): 8.85 Post- production operations: 3.0 (12 months): Post- production operations: 5.5 Profiles were constructed using information supplied by 14 companies using open source Hadoop, the same number using BigInsights, and one using both. For each of the industries shown above, comparisons were based on companies of approximately the same size, with generally similar business profiles and applications. Companies were based in the United States (26) and Europe (3). Companies supplied information on applications; development and deployment times for these; and numbers of FTE personnel for (1) application development and deployment, and (2) ongoing postproduction operations. Because job descriptions and titles often varied between companies, numbers of FTEs for equivalent specializations were in some cases estimated by the International Technology Group. International Technology Group 11

14 Cost Calculations Personnel costs were calculated for numbers of FTEs based on the annual salary assumptions shown in figure 5. The same assumptions were employed for use of BigInsights and open source Hadoop tools. SPECIALIZATION SALARY SPECIALIZATION SALARY Data scientist (1) $200K Developer (1) (2) $132K Architect/equivalent (1) $189K Data specialist (1) (2) $147K Project manager (1) $154K Installation specialist (1) $135K Lead developer (1) $140K System administrator (2) $104K (1) (2) Post- production operations Figure 5: FTE Salary Assumptions Calculations were based on numbers of FTEs for applicable periods. For the health care company, for example, costs were calculated for numbers of development and deployment FTEs for six months, while post-production personnel costs were calculated for 36 6 = 30 months. Salaries were increased by percent to allow for benefits, bonuses and other per capita costs. Software costs for use of BigInsights were calculated based on IBM pricing per terabyte of disk storage for the capacities shown in figure 6. As BigInsights license fees include one year of software maintenance (SWMA) coverage for no additional charge, support costs are for two years. Calculations allowed for user-reported discounts. Cost Breakdowns Breakdowns for individual profiles are shown in figure 6. IBM INFOSPHERE BIGINSIGHTS Health Care Financial Services Retail COMPANY TYPE Media Marketing Services Telecom Licenses & support Personnel , , , , Ongoing operations 1, , , , , , Personnel total 1, , , , , , TOTAL ($ thousands) 2, , , , , , OPEN SOURCE APACHE HADOOP Licenses & support Personnel , , , , , Ongoing operations 2, , , , , , Personnel total 2, , , , , , TOTAL ($ thousands) 2, , , , , , Figure 6: Three-year Cost Breakdowns International Technology Group 12

15 ABOUT THE INTERNATIONAL TECHNOLOGY GROUP ITG sharpens your awareness of what s happening and your competitive edge... this could affect your future growth and profit prospects International Technology Group (ITG), established in 1983, is an independent research and management consulting firm specializing in information technology (IT) investment strategy, cost/benefit metrics, infrastructure studies, deployment tactics, business alignment and financial analysis. ITG was an early innovator and pioneer in developing total cost of ownership (TCO) and return on investment (ROI) processes and methodologies. In 2004, the firm received a Decade of Education Award from the Information Technology Financial Management Association (ITFMA), the leading professional association dedicated to education and advancement of financial management practices in end-user IT organizations. The firm has undertaken more than 120 major consulting projects, released more than 250 management reports and white papers and more than 1,800 briefings and presentations to individual clients, user groups, industry conferences and seminars throughout the world. Client services are designed to provide factual data and reliable documentation to assist in the decisionmaking process. Information provided establishes the basis for developing tactical and strategic plans. Important developments are analyzed and practical guidance is offered on the most effective ways to respond to changes that may impact complex IT deployment agendas. A broad range of services is offered, furnishing clients with the information necessary to complement their internal capabilities and resources. Customized client programs involve various combinations of the following deliverables: Status Reports Management Briefs Management Briefings Executive Presentations Communications Telephone Consultation In-depth studies of important issues Detailed analysis of significant developments Periodic interactive meetings with management Scheduled strategic presentations for decision-makers Timely replies to informational requests Immediate response to informational needs Clients include a cross section of IT end users in the private and public sectors representing multinational corporations, industrial companies, financial institutions, service organizations, educational institutions, federal and state government agencies as well as IT system suppliers, software vendors and service firms. Federal government clients have included agencies within the Department of Defense (e.g., DISA), Department of Transportation (e.g., F International Technology Group 609 Pacific Avenue, Suite 102 Santa Cruz, California Telephone: Website: ITGforInfo.com

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

Business Case for Enterprise Big Data Deployments

Business Case for Enterprise Big Data Deployments July 2013 MANAGEMENT BRIEF Business Case for Enterprise Big Data Deployments Comparing Costs, Benefits and Risks for Use of IBM InfoSphere Streams and Open Source Storm International Technology Group 609

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Why Big Data in the Cloud?

Why Big Data in the Cloud? Have 40 Why Big Data in the Cloud? Colin White, BI Research January 2014 Sponsored by Treasure Data TABLE OF CONTENTS Introduction The Importance of Big Data The Role of Cloud Computing Using Big Data

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems Proactively address regulatory compliance requirements and protect sensitive data in real time Highlights Monitor and audit data activity

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

IBM Big Data Platform

IBM Big Data Platform Mike Winer IBM Information Management IBM Big Data Platform The big data opportunity Extracting insight from an immense volume, variety and velocity of data, in a timely and cost-effective manner. Variety:

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Delivering Real-World Total Cost of Ownership and Operational Benefits

Delivering Real-World Total Cost of Ownership and Operational Benefits Delivering Real-World Total Cost of Ownership and Operational Benefits Treasure Data - Delivering Real-World Total Cost of Ownership and Operational Benefits 1 Background Big Data is traditionally thought

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

ITG Executive Summary

ITG Executive Summary ITG Executive Summary October 2008 VALUE PROPOSITION FOR IBM SOFTWARE PREMIUM SUPPORT SERVICES: QUANTIFYING THE COST/BENEFIT CASE Why Premium Support? What is the value of an IBM Premium Support contract?

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

ITG Executive Summary

ITG Executive Summary ITG Executive Summary VALUE PROPOSITION FOR IBM SYSTEMS DIRECTOR: CHALLENGES OF OPERATIONAL MANAGEMENT FOR ENTERPRISE SERVER INSTALLATIONS Economics Benefits for IBM Power Systems Deployment Challenges

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

ORACLE DATA INTEGRATOR ENTERPRISE EDITION ORACLE DATA INTEGRATOR ENTERPRISE EDITION Oracle Data Integrator Enterprise Edition 12c delivers high-performance data movement and transformation among enterprise platforms with its open and integrated

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Customized Report- Big Data

Customized Report- Big Data GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

CDH AND BUSINESS CONTINUITY:

CDH AND BUSINESS CONTINUITY: WHITE PAPER CDH AND BUSINESS CONTINUITY: An overview of the availability, data protection and disaster recovery features in Hadoop Abstract Using the sophisticated built-in capabilities of CDH for tunable

More information

IBM Big Data Platform

IBM Big Data Platform IBM Big Data Platform Turning big data into smarter decisions Stefan Söderlund. IBM kundarkitekt, Försvarsmakten Sesam vår-seminarie Big Data, Bigga byte kräver Pigga Hertz! May 16, 2013 By 2015, 80% of

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

Big Data and Hadoop for the Executive A Reference Guide

Big Data and Hadoop for the Executive A Reference Guide Big Data and Hadoop for the Executive A Reference Guide Overview The amount of information being collected by companies today is incredible. Wal- Mart has 460 terabytes of data, which, according to the

More information

Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Big Data Defined Introducing DataStack 3.0

Big Data Defined Introducing DataStack 3.0 Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...

More information

Dell In-Memory Appliance for Cloudera Enterprise

Dell In-Memory Appliance for Cloudera Enterprise Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/

More information

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile. Einsatzfelder von IBM PureData Systems und Ihre Vorteile demirkaya@de.ibm.com Agenda Information technology challenges PureSystems and PureData introduction PureData for Transactions PureData for Analytics

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel A Next-Generation Analytics Ecosystem for Big Data Colin White, BI Research September 2012 Sponsored by ParAccel BIG DATA IS BIG NEWS The value of big data lies in the business analytics that can be generated

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

ITG Executive Summary

ITG Executive Summary ITG Executive Summary VALUE PROPOSITION FOR IBM POWER SYSTEMS SERVERS AND IBM I: MINIMIZING COSTS AND RISKS FOR MIDSIZE BUSINESSES Challenges February 2011 The challenges faced by midsize businesses remain

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances INSIGHT Oracle's All- Out Assault on the Big Data Market: Offering Hadoop, R, Cubes, and Scalable IMDB in Familiar Packages Carl W. Olofson IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013 Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2016 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

OnX Big Data Reference Architecture

OnX Big Data Reference Architecture OnX Big Data Reference Architecture Knowledge is Power when it comes to Business Strategy The business landscape of decision-making is converging during a period in which: > Data is considered by most

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads Solution Overview Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads What You Will Learn MapR Hadoop clusters on Cisco Unified Computing System (Cisco UCS

More information

Making Open Source BI Viable for the Enterprise. An Alternative Approach for Better Business Decision-Making. White Paper

Making Open Source BI Viable for the Enterprise. An Alternative Approach for Better Business Decision-Making. White Paper Making Open Source BI Viable for the Enterprise An Alternative Approach for Better Business Decision-Making White Paper Aligning Business and IT To Improve Performance Ventana Research 6150 Stoneridge

More information

Oracle Big Data Handbook

Oracle Big Data Handbook ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015

More information

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? December, 2014 What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data? Glenn Anderson IBM Lab Services and Training Today s mainframe is a hybrid system z/os Linux on Sys z DB2 Analytics Accelerator

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

How Cisco IT Built Big Data Platform to Transform Data Management

How Cisco IT Built Big Data Platform to Transform Data Management Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

Oracle s Big Data solutions. Roger Wullschleger.

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

HadoopTM Analytics DDN

HadoopTM Analytics DDN DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate

More information

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data Research Report CA Technologies Big Data Infrastructure Management Executive Summary CA Technologies recently exhibited new technology innovations, marking its entry into the Big Data marketplace with

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Staying agile with Big Data

Staying agile with Big Data An Ovum white paper for Red Hat Publication Date: 09 Sep 2014 Tony Baer Summary Catalyst Like any major technology project, organizations implementing Big Data projects face challenges with aligning business

More information

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop Modern Data Architecture with Enterprise Apache Hadoop Hortonworks. We do Hadoop. Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Our Mission: Enable your Modern Data Architecture

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information