Big Data 101 Webinar



Similar documents
Big Data: Making Sense of it all!

Bringing Big Data to People

The Future of Data Management with Hadoop and the Enterprise Data Hub

HDP Hadoop From concept to deployment.

Big Data Realities Hadoop in the Enterprise Architecture

The Future of Data Management

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Modernizing Your Data Warehouse for Hadoop

HDP Enabling the Modern Data Architecture

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Comprehensive Analytics on the Hortonworks Data Platform

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Transforming the Telecoms Business using Big Data and Analytics

#TalendSandbox for Big Data

How To Handle Big Data With A Data Scientist

Are You Ready for Big Data?

The Evolving Apache Hadoop Eco-System

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

SAP and Hortonworks Reference Architecture

Talend Big Data. Delivering instant value from all your data. Talend

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Information Builders Mission & Value Proposition

Luncheon Webinar Series May 13, 2013

Big Data Executive Survey

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Please give me your feedback

Ganzheitliches Datenmanagement

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

How To Scale Out Of A Nosql Database

The Enterprise Data Hub and The Modern Information Architecture

The Next Wave of Data Management. Is Big Data The New Normal?

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Are You Ready for Big Data?

Hadoop. Sunday, November 25, 12

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Modern Data Architecture for Predictive Analytics

Evolution to Revolution: Big Data 2.0

Hadoop implementation of MapReduce computational model. Ján Vaňo

Big Data and Your Data Warehouse Philip Russom

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Big Data Big Data/Data Analytics & Software Development

Microsoft SQL Server 2012 with Hadoop

Data Integration Checklist

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

The 4 Pillars of Technosoft s Big Data Practice

Big Data and Hadoop for the Executive A Reference Guide

Integrating a Big Data Platform into Government:

Microsoft Big Data. Solution Brief

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Implement Hadoop jobs to extract business value from large and varied data sets

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Manifest for Big Data Pig, Hive & Jaql

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Tap into Hadoop and Other No SQL Sources

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Upcoming Announcements

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

IBM Big Data Platform

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Workshop on Hadoop with Big Data

Big Data and Data Science: Behind the Buzz Words

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Transcription:

Big Data 101 Webinar A Functional Introduction Today s Presenters: Paul S. Barth, PhD, Managing Partner Prithwi Thakuria, Big Data Practice Lead NewVantage Partners An Introduction Structured Semi Structured Un-Structured What is Big Data? Batch Near Real Time Streaming What Problems will it solve? Store and Process large amounts of data for enterprise batch and near real time needs in a timely and acceptable manner. Insights Predictive Machine Learning Variety Value Velocity Volume Terabytes Petabytes Exabytes Ability to ingest and combine variety of data coming from different sources at different velocity and variety in a rapid manner to create value like enhanced analytics, deep insights and machine learning and prediction. Reduce significantly the time, money and resources towards enterprise infrastructure and business value generation. It is the next big frontier for innovation, productivity and competition. March 6, 2013 1

What led us here? EXABYTES TERABYTES GIGABYTES MEGABYTES Mobile Web Social Interactions Social Media A/B Testing Blogs, Wikis The Web Era Behavioral Targeting The CRM Era Data Warehouses Big Data Era User Content Speech to Text Sensors & RFID Spatial & GPS Sentiment SMS/MMS Analysis Partner Feeds The ERP Era Financials Inventory Trading Web Logs Segmentation Dynamic Pricing Offer Details Click Stream Analysis Search Marketing Demographics UN - STRUCTURED SEMI - STRUCTURED STRUCTURED Why is it possible now? Big Data leverages the cost/performance of large server grids and open source software. Cost per TB $37,00 0 $5,000 $2,000 Database Appliance Hadoop March 6, 2013 2

Problems to Opportunities The perspective has changed from being a problem about storage, processing, retrieving and analyzing to business opportunities. Firms are embracing new ideas and technologies to co-exist with existing investments. STORE SAN HDFS Hbase, Cassandra Hadoop DB Store Cloud Jive Chorus Enterprise Portals COLLABORATE OPPORTUNITY PROCESS SQL MapReduce Pig Hive, CloudBase OLAP Text/Data Mining Social/Semantic Analysis Visualization Reporting ANALYZE RETRIEVE SQL MapReduce Key-Value RESTFul Creating Value in Enterprise Accessibility to Data Enhanced visibility of relevant information and better transparency to massive amounts of data. Improved reporting to stakeholders. Decision Making Next generation analytics can enable automated decision making (inventory management, financial risk assessment, sensor data management, machinery tuning). Marketing Trends Segmentation of population to customize offerings and marketing campaigns (consumer goods, retail, social, clinical data, etc). Performance Improvement Exploration for, and discovery of, new needs, can drive organizations to fine tune for optimal performance and efficiency (employee data). Innovation Discovery of trends will lead organizations to form new business models to adapt by creating new service offerings for their customers. Intermediary companies with big data expertise will provide analytics to 3rd parties. 6 March 6, 2013 3

What are Business Doing with Big Data? NVP Big Data Survey Over 50 Fortune 500 executives and leaders 70% are large financial services companies 65 in-depth questions benchmarking investment levels, applications, organizational structure, and skills Key Findings Big Data Investments 85% have big data programs planned or underway 25% are spending over $10MM annually on big data 36% expect to spend over $10MM in 3 years The primary drivers are better analytics about Customers and Risk Most companies are using big data to integrate existing corporate data from diverse sources Not external data or advanced analytics 85% of the initiatives are sponsored jointly by the business and IT All companies are struggling to attract, grow, and retain data scientists Banking Bank of America CitiGroup JP Morgan RBS Citizens Financial US Bank Wells Fargo Bank Insurance Health Care Aetna Broad Institute Cigna CVS/Caremark The Hartford Harvard Pilgrim Health Care SunLife Financial Travelers United Healthcare Government Department of Defense General Services Administration Department of Health and Human Services Social Security Administration Investments Bank of New York Mellon Charles Schwab Conning Asset Mgmt Fidelity Investments ING Putnam Investments State Street Bank TD Ameritrade TIAA-CREF Wellington Financial Financial Services / Other American Express Freddie Mac General Electric (GE) MasterCard Pitney Bowes Thomson Reuters VISA Wright Express Media and Technology Avid Technology Time Warner Cable Survey results available at www.newvantage.com Proprietary Information Known Industry Adopters (Hadoop World 09) Organization Use Case Visa JP Morgan Chase China Mobile Rackspace eharmony General Sentiment Yahoo! Visible Technologies Facebook Sears Crédit Mutuel Arkéa Large scale transaction analysis Data processing for financial services Data mining platform for telecom industry Cross data center log processing Matchmaking in the Hadoop cloud Understanding natural language Social graph analysis Real-time business intelligence Data warehouse with Hadoop and Hive Mainframe Migration Mainframe Migration 8 March 6, 2013 4

Case I: Customer Cross-channel Path Analysis Load all customer activities on big data platform Web, call, branch, marketing, service, transactions Develop an event series for each customer over 6 months Identify most common paths to sale, attrition, and outliers Platform Cost Data Loading Analytics Big Data < $1MM 2 days 25 Lines of Code Relational > $5MM 1 month 25,000 Lines of SQL Processing Time 40 hours 2 months Big data benefits Organize the data as needed, after it is loaded Event series data is non-relational but simple to program Query and analysis are run in a single pass of the data Case II: Operational Data for Risk Analytics Load operational mainframe data files on big data platform Nightly snapshots from 20+ systems all fields, all records Use standard ETL tools to select data for ad-hoc requests Organize data into relational format for future reuse Platform Cost Data Loading Data Delivery Big Data < $1MM 2 days 1 week Data Warehouse > $10MM 12 months 1 week Processing Time 2 hours 16 hours Big data benefits All data is available for ad-hoc requests Data is delivered to the business while the relational database is being built Integration with ETL and relational tools leverages existing skills March 6, 2013 5

So What is Hadoop? Hadoop is a free, Java-based data management framework from Apache that supports the processing and computation of large data sets in a distributed computing environment. It allows the capture, process and share data in any format and scale. Manage, operate, gain insights and create analytics for innovation, productivity and competition. Operate Open and promote the exchange and integration of data with new and existing enterprise applications. Process and manage data of any size with tools to sort, filter, summarize and apply basic functions to the data. Ingest and store variety of data in real time and batch. Natively redundant and distributed programming model for large data sets Distributed, scalable, self-healing, high bandwidth and portable file system that splits tasks across processors near the data Capture Process Integrate Programming Model Storage 11 Hadoop Capabilities Capture Integrate Process Core Operate Templeton WebHDFS Sqoop Flume HCatalog Pig Hive HBase MapReduce HDFS Ambari Oozie Mahout Zookeeper Hadoopis a collection of many open source and commercial packages that create a data and analytics ecosystem for applications. Thousands of programmers continue to develop new capabilities. 12 March 6, 2013 6

Understanding MapReduce MapReduce is Google s programming paradigm, or framework, which represents an approach to handle dataintensive problems in a distributed manner. Basic notion: A computation is applied against a large number of records or partitions and intermediate results are generated (map function). Next the intermediate results are aggregated in some fashion to produce the final outcome (reduce function). Partition 1 Partition 2 Map Task 1 Intermediate 1 Reduce Task 1 Results 1 Partition 3 Partition 4 Partition 5 Map Task 2 Map Task 3 Intermediate 2 Intermediate 3 Reduce Task 2 Results 2 MapReduce can be successfully applied to the problem of scaling a software application through multicore processors and multiple machine cluster infrastructure (Cloud). 13 MapReduce Example: Face Recognition Spread 1 Million image records across 100 servers Map the matching program on 100 servers, each returning the top 10 matches Reduce the 1,000 results to return the top 10 best matches Step (secs) Server Load Image 1 Big Data Grid: 100 Servers 1 Scan Images Match Images Sort Top 10 1,000 1,000 1 10 10 1 Total 2,002 22 March 6, 2013 7

Enterprises are Integrating Big Data with Existing Systems Big Data Sources ERM/CRM ERP TRANSACTIONS Business Infrastructure DevTools Apps / Spreadsheets BI / Visualization CEP BPM Customer Facing Web Apps Mobile Apps Deep Analytics Next Generation Collaboration Self-Service Un-constrained Core Applications Financials OBSERVATIONS Discovery Tools ODS EDW Marts Low Latency / NoSQL Social Media Exhaust Data Web Logs INTERACTIONS Templeton WebHDFS Sqoop Flume HCatalog Pig Hive HBase MapReduce HDFS Ambari Oozie Mahout Zookeeper Public Domain Paid Demographics INFORMATION New /Custom Hadoop Operations Existing Confidential & Proprietary - NewVanatge Partners 15 Challenges To Capture the Full Potential Of Big Data Several Key Challenges Have To Be Overcome Governance Regulationand Security Organizational Change and Talent ITDelivery and Industry Structure Supporting Technology Ownershipof, and access to data. Traditional compliance and securitytools might not fit. Integrating enterprise and 3 rd party data has legal restrictions Big Data still in early stages There might be organizational changes required. Shortageof specialized analytical skills. New business model to take advantageof accelerated analytics Traditional SDLC models will limit business agility Legacy infrastructure in some industry sectors limits integration Politicalresistance and leadership buyin. Technologystill evolving. Analyticaltheory to support big data not mature. 16 March 6, 2013 8

Big Data Usage Patterns Pattern Description Example Exploration& machine learning Operational prediction Acceleratingaccess to operational data Bulk data operations & extreme ETL Stream & event analytics Iterating on large data sets, looking for patterns and new ways to predict future trends Big data feeds operational predictive models with new data upon which to base predictions Store and distribute raw, semistructured operational data for expert analysis Batch operations on data at massive scale are conducted using parallel processing techniques Rapidly changing data are processed in parallel using complex events or more sophisticated stream filtering and mining algorithms AMLand Fraud patterns, counterparty risk analysis, e-mail and social media analytics Online fraud detection, market alerts, trade analytics Rapidresponse to management and regulatory questions. Making data warehouse operations faster and cheaper with massive scale bulk data movements Trade analytics, fraud detection, online customization, next-best product, business alerts 17 Big Data Vendors Hadoop distributors Hadoop integrators Proprietary solutions Strategy, Execution and Delivery Contribute to Hadoop OSIor extend Apache and distribute their own flavor. Offer consulting and training services. Create additional tools to make Hadoop enterprise class. Cloudera Hortonworks Provide Hadoop integrations to their existing tools to access and analyze big data. Provide tools that make developing big data solutions easier. Provide solution frameworks and packages that use Hadoop under the hood. IBM Big insights Karmasphere Informatica EMC/Greenplum Have created their own data storage and analytic platforms. Generally meet the characteristics of big data MPP on top of commodity hardware. Teradata/Asterdat a LexisNexis HPCC Microsoft Linq2HPC Experience with traditional solutions for BI, analytics, databases and data governance Consider big data as part of overall business strategy and technical architecture Established design patterns to solve use cases across industries Hands-on expertise and proven methodologies NewVantage Partners 18 March 6, 2013 9

THANK YOU www.newvantage.com March 6, 2013 10