Interactive data analytics drive insights



Similar documents
Dell* In-Memory Appliance for Cloudera* Enterprise

Dell In-Memory Appliance for Cloudera Enterprise

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

The Future of Data Management

Dell Cloudera Syncsort Data Warehouse Optimization ETL Offload

Cloudera Enterprise Data Hub in Telecom:

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

The Future of Data Management with Hadoop and the Enterprise Data Hub

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big data: Unlocking strategic dimensions

Oracle Big Data SQL Technical Update

Dell s SAP HANA Appliance

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Why Big Data in the Cloud?

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

How To Turn Big Data Into An Insight

IBM InfoSphere BigInsights Enterprise Edition

Deploying an Operational Data Store Designed for Big Data

Agenda. Big Data. Dell Cloud Solutions A Dell Story Summary. Concepts Market Trends and Challenges Dell Solutions

Fast, Low-Overhead Encryption for Apache Hadoop*

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

Bringing the Power of SAS to Hadoop. White Paper

Tap into Big Data at the Speed of Business

More Data in Less Time

locuz.com Big Data Services

The 4 Pillars of Technosoft s Big Data Practice

SAP HANA - an inflection point

How To Handle Big Data With A Data Scientist

CA Technologies Big Data Infrastructure Management Unified Management and Visibility of Big Data

and Hadoop Technology

Native Connectivity to Big Data Sources in MSTR 10

Databricks. A Primer

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

IBM System x reference architecture solutions for big data

Big Data and Natural Language: Extracting Insight From Text

Hadoop for Enterprises:

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data & the Cloud: The Sum Is Greater Than the Parts

In-Memory Analytics for Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Real-Time Big Data Analytics for the Enterprise

Driving Growth in Insurance With a Big Data Architecture

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

The Enterprise Data Hub and The Modern Information Architecture

1 Performance Moves to the Forefront for Data Warehouse Initiatives. 2 Real-Time Data Gets Real

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Navigating Big Data business analytics

How Companies are! Using Spark

A New Era Of Analytic

Cray: Enabling Real-Time Discovery in Big Data

Why DBMSs Matter More than Ever in the Big Data Era

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

Luncheon Webinar Series May 13, 2013

Oracle Big Data Management System

Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries

Big Data Performance Growth on the Rise

Cisco Unified Data Center Solutions for MapR: Deliver Automated, High-Performance Hadoop Workloads

Introducing Oracle Exalytics In-Memory Machine

Please give me your feedback

BEYOND BI: Big Data Analytic Use Cases

HadoopTM Analytics DDN

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

HDP Hadoop From concept to deployment.

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Databricks. A Primer

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

How To Make Data Streaming A Real Time Intelligence

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

The Future of Big Data SAS Automotive Roundtable Los Angeles, CA 5 March 2015 Mike Olson Chief Strategy Officer,

Cisco Data Preparation

Extend your analytic capabilities with SAP Predictive Analysis

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

What Is In-Memory Computing and What Does It Mean to U.S. Leaders? EXECUTIVE WHITE PAPER

Microsoft Analytics Platform System. Solution Brief

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

Platfora Big Data Analytics

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation

Big Data Are You Ready? Thomas Kyte

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Comprehensive Analytics on the Hortonworks Data Platform

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

An Oracle White Paper June Oracle: Big Data for the Enterprise

SQL Server 2012 Parallel Data Warehouse. Solution Brief

QUICK FACTS. Delivering a Unified Data Architecture for Sony Computer Entertainment America TEKSYSTEMS GLOBAL SERVICES CUSTOMER SUCCESS STORIES

Transcription:

Big data Interactive data analytics drive insights Daniel Davis/Invodo/S&P. Screen images courtesy of Landmark Software and Services By Armando Acosta and Joey Jablonski The Apache Hadoop Big data has taken a giant leap beyond its large-enterprise roots, entering platform speeds storage, boardrooms and data centers across organizations of all sizes and processing and analysis industries. The Apache Hadoop platform has evolved along with the big data landscape and emerged as a major option for storing, processing and of big, complex data sets, analyzing large, complex data sets. In comparison, traditional relational supporting innovative tools management database or enterprise data warehouse tools often lack the that draw immediate insights. capability to handle such large amounts of diverse data effectively. Hadoop enables distributed parallel processing of high-volume, highvelocity data across industry-standard servers that both store and process the data. Because it supports structured, semi-structured and unstructured data from disparate systems, the highly scalable Hadoop framework allows organizations to store and analyze more of their data than before to extract business insights. As an open platform for data management and analysis, Hadoop complements existing data systems to bring organizational capabilities into the big data era as analytics environments grow more complex. Evolving data needs Early adopters tended to utilize Hadoop for batch processing; prime use cases included data warehouse optimization and extract, transform, 1

load (ETL) processes. Now, IT leaders are expanding the application of Hadoop and related technologies to customer analytics, churn analysis, network security and fraud prevention many of which require interactive processing and analysis. As organizations transition to big data technologies, Hadoop has become essential for enabling predictive analytics that use multiple data sources and types. Predictive analytics helps organizations in many different industries answer business-critical questions that had been beyond their reach using basic spreadsheets, databases or business intelligence (BI) tools. For example, financial services companies can move from asking How much does each customer have in their account? to answering sophisticated business enablement questions such as What upsell should I offer a 25-year-old male with checking and IRA accounts? Retail businesses can progress from How much did we sell last month? to What packages of products are most likely to sell in a given market region? A healthcare organization can predict which patient is most likely to develop diabetes and when. Using Hadoop and analytical tools to manage and analyze big data, organizations can personalize each customer experience, predict manufacturing breakdowns to avoid costly repairs and downtime, maximize the potential for business teams to unlock valuable insights, drive increased revenue and more. [See the sidebar, Doing the (previously) impossible. ] Doing the (previously) impossible Apache Hadoop and big data analytics capabilities enable organizations to do what they couldn t do before, whether that means making memorable customer experiences or optimizing operations. Personalized content. A digital media company turned to Hadoop when burgeoning data volumes hindered its mission to simplify marketers access to data that would let them tailor content to individual customers. The company s move to Cloudera Enterprise, powered by Dell PowerEdge servers, enabled complex, large-scale data processing that delivered greater than 90 percent accuracy for its content personalization services. Moreover, the 24x7 reliability of the Hadoop platform lets the company provide the data its customers need, when they need it. Product quality management. To help global manufacturers efficiently manage product quality, Omneo implemented a software solution based on the Cloudera Distribution including Apache Hadoop (CDH) running on a cluster of Dell PowerEdge servers. Using the solution, Omneo customers can quickly search, analyze and mine all their data in a single place, so they can identify and resolve emerging supply chain issues. We are able to help customers search billions of records in seconds with Dell infrastructure and support, Cloudera s Hadoop solution, and our knowledge of supply chain and quality issues, says Karim Lokas, senior vice president of marketing and product strategy for Omneo, a division of the global enterprise manufacturing software firm Camstar Systems. With the visibility provided by this solution, manufacturers can put out more consistent, better products and have less suspect product go out the door. Information security services. Dell SecureWorks is on deck 24 hours a day, 365 days a year, to help protect customer IT assets against cyberthreats. To meet its enormous data processing challenges, Dell SecureWorks deployed the Dell Cloudera Apache Hadoop Solution, powered by Intel Xeon processors, to process billions of events every day. We can collect and more effectively analyze data with the Dell Cloudera Apache Hadoop Solution, says Robert Scudiere, executive director of engineering for SecureWorks. That means we re able to increase our research capabilities, which helps with our intelligence services and enables better protection for our clients. By moving to the Dell Cloudera Apache Hadoop Solution, Dell SecureWorks can put more data into its clients hands so they can respond faster to security threats than before. 2

Big data Parlaying big data to best advantage Effective use of big data is key to competitive gain, and Dell works with ecosystem partners to help organizations succeed as they evolve their data analytics capabilities. Cloudera plays an important role in the Hadoop ecosystem by providing support and professional feature development to help organizations leverage the opensource platform. The combination of Cloudera software on Dell servers enables organizations to successfully implement new data capabilities on field-tested, low-risk technologies. (See the sidebar, Taking Hadoop for a test-drive. ) Dell Cloudera Hadoop Solutions comprise software, hardware, joint support, services and reference architectures that support rapid deployment and streamlined management (see figure). Dell PowerEdge servers, powered by the latest Intel Xeon processors, provide the hardware platform. Dell Cloudera Hadoop Solutions are available with Cloudera Enterprise, designed specifically for mission-critical environments. Cloudera Enterprise comprises the Cloudera Distribution including Apache Hadoop (CDH) and the management software and support services needed to keep a Hadoop cluster running consistently and predictably. Cloudera Enterprise allows organizations to implement powerful end-to-end analytic workflows including batch data processing, interactive query, navigated search, deep data mining and stream processing from a single common platform. Accelerated processing. Cloudera Enterprise leverages Hadoop YARN (Yet Another Resource Negotiator), a resource management framework designed to transition users from general batch processing with Hadoop MapReduce to interactive processing. The Apache Spark compute engine provides a prime example of how YARN enables organizations to build an interactive analytics platform capable of large-scale data Dell Cloudera Hadoop Solutions, accelerated by Intel, provide organizations of all sizes with several turnkey options to meet a wide range of big data use cases. Analytic software solutions for Hadoop Cloudera Enterprise Data Hub Dell PowerEdge servers and networking Dell reference architecture Installation and configuration service Dell Professional Services for Hadoop Investigate Discover Plan Implement Solution stack: Dell Cloudera Hadoop Solutions for big data processing. (See the sidebar, Revving up cluster computing. ) Built-in security. Rolebased access control is critical for supporting data security, governance and compliance. The Apache Sentry system, integrated in CDH, enhances data access protection by defining what users and applications can do with data, based on permissions and authorization. Apache Sentry continues to expand its support for other ecosystem tools within Hadoop. It also includes features and functionality from Project Rhino, originally developed by Intel to enable a consistent security framework for Hadoop components and technologies. Supporting rapid big data implementations Dell Cloudera Hadoop Solutions, accelerated by Intel, provide organizations of all sizes with several turnkey options to meet a wide range of big data use cases. Getting started. Dell QuickStart for Cloudera Hadoop enables organizations to easily and costeffectively engage in Hadoop 3

development, testing and proofof-concept work. The solution includes Dell PowerEdge servers, Cloudera Enterprise Basic Edition and Dell Professional Services to help organizations quickly deploy Hadoop and test processes, data analysis methodologies and operational needs against a fully functioning Hadoop cluster. Taking the first steps with Hadoop through Dell QuickStart allows organizations to accelerate cluster deployment to pinpoint effective strategies that address the business and technical demands of a big data implementation. Going mainstream. The Dell Cloudera Apache Hadoop Solution is an enterprise-ready, end-to-end big data solution that comprises Dell PowerEdge servers, Dell Networking switches, Cloudera Enterprise software and optional managed Hadoop services. The solution also includes Dell Cloudera Reference Architectures, which offer tested configurations and known performance characteristics to speed the deployment of new data platforms. Cloudera Enterprise is thoroughly tested and certified to integrate with a wide range of operating systems, hardware, databases, data warehouses, and BI and ETL systems. Broad compatibility enables organizations to take advantage of Hadoop while leveraging their existing tools and resources. Advancing analytics. The shift to near-real-time analytics processing necessitates systems that can handle memory-intensive workloads. In response, Dell teamed up with Taking Hadoop for a test-drive How can IT decision makers determine the best way to capitalize on an investment in Apache Hadoop and big data initiatives? Dell has teamed up with Intel to offer the Dell Intel Cloud Acceleration Program at Dell Solution Centers, giving decision makers a firsthand opportunity to see and test Dell big data solutions. Experts at Dell Solution Centers located worldwide help bolster the technical skills of anyone new (and not so new) to Hadoop. Participants gain hands-on experience in a variety of areas, from optimizing performance for an application deployed on Dell servers to exploring big data solutions using Hadoop. At a Dell Solution Center, participants can attend a technical briefing with a Dell expert, investigate an architectural design workshop or build a proof of concept to comprehensively validate a big data solution and streamline deployment. Using an organization s specific configurations and test data, participants can discover how a big data solution from Dell meets their business needs. For more information, visit Dell Solution Centers. Cloudera and Intel to develop addresses the needs of the Dell In-Memory Appliance for organizations that want to use Cloudera Enterprise with Apache high-performance interactive Spark, aimed at simplifying and data analysis for analyzing utility accelerating Hadoop cluster smart meter data, social data for deployments. By providing fast marketing applications, trading time to value, the appliance allows data for hedge funds, or server organizations to focus on driving and network log data. Other uses innovation and results, rather than include detecting network intrusion on using resources to deploy their and enabling interactive fraud Hadoop cluster. detection and prevention. The appliance s ease of Built on Dell hardware and deployment and scalability an Intel performance- and 4

Big data security-optimized chipset, the appliance includes Cloudera Enterprise, which is designed to store any amount or type of data in its original form for as long as desired. The Dell In-Memory Appliance for Cloudera Enterprise comes bundled with Apache Spark and Cloudera Enterprise components such as Cloudera Impala and Cloudera Search. Cloudera Impala is an opensource massively parallel processing (MPP) query engine that runs natively in Hadoop. The Apachelicensed project enables users to issue low-latency SQL queries to data stored in Apache HDFS (Hadoop Distributed File System) and the Apache HBase columnar data store without requiring data movement or transformation. Cloudera Search brings full-text, interactive search and scalable, flexible indexing to CDH and enterprise data hubs. Powered by Hadoop and the Apache Solr open-source enterprise search platform, Cloudera Search is designed to deliver scale and reliability for integrated, multi-workload search. Changing the game Since its beginnings in 2005, Apache Hadoop has played a significant role in advancing large-scale data processing. Likewise, Dell has been working with organizations to customize big data platforms since 2009, delivering some of the first systems optimized to run demanding Hadoop workloads. Just as Hadoop has evolved into a major data platform, Revving up cluster computing The expansion of the Internet of Things (IoT) has led to a proliferation of connected devices and machines with embedded sensors that generate tremendous amounts of data. To derive meaningful insights quickly from this data, organizations need interactive processing and analytics, as well as simplified ecosystems and solution stacks. Apache Spark is poised to become the underpinning technology driving the analysis of IoT data. Spark utilizes in-memory computing to deliver high-performance data processing. It enables applications in Hadoop clusters to run up to 100 times faster than Hadoop MapReduce in memory or 10 times faster on disk. Integrated with Hadoop, Spark runs on the Hadoop YARN (Yet Another Resource Negotiator) cluster manager and is designed to read any existing Hadoop data. Within its computing framework, Spark is tooled with analytics capabilities that support interactive query, iterative processing, streaming data and complex analytics such as machine learning and graph analytics. Because Spark combines these capabilities in a single workflow out of the box, organizations can use one tool instead of traditional specialized systems for each type of analysis, streamlining their data analytics environments. Dell sees Apache Spark as a game-changer for interactive processing, driving Hadoop as the data platform of choice. With connected devices and embedded sensors generating a huge influx of data, streaming data must be analyzed in a fast, efficient manner. Spark offers the flexibility and tools to meet these needs, from running machine-learning algorithms to graphing and visualizing the interrelationships among data elements all on one platform. Working together with other industry innovators, Dell is enabling organizations of all sizes to harness the power of Hadoop to accelerate actionable business insights. Authors Armando Acosta is a senior product line consultant at Dell, specializing in Dell big data and Hadoop solutions. Joey Jablonski is an enterprise technologist at Dell, focusing on the strategy, architecture and development of analytic and big data technologies. Learn More Hadoop Solutions from Dell Dell Big Data Hadoop@Dell.com Dell, PowerEdge and SecureWorks are trademarks of Dell Inc. 5