Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth



Similar documents
The Future of Data Management

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

How To Handle Big Data With A Data Scientist

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Native Connectivity to Big Data Sources in MSTR 10

Moving From Hadoop to Spark

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Building Your Big Data Team

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Getting Started Practical Input For Your Roadmap

the missing log collector Treasure Data, Inc. Muga Nishizawa

The 4 Pillars of Technosoft s Big Data Practice

How To Scale Out Of A Nosql Database

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Presenters: Luke Dougherty & Steve Crabb

Big Data Analytics Nokia

Data Integration Checklist

Oracle Big Data SQL Technical Update

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Tap into Hadoop and Other No SQL Sources

Data Analytics Infrastructure

Big Data Multi-Platform Analytics (Hadoop, NoSQL, Graph, Analytical Database)

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

Analytics Industry Trends Survey. Research conducted and written by:

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

Sentimental Analysis using Hadoop Phase 2: Week 2

Putting Apache Kafka to Use!

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

A Case Study of Hadoop in Healthcare

The Inside Scoop on Hadoop

This Symposium brought to you by

The Enterprise Data Hub and The Modern Information Architecture

Big Data Can Drive the Business and IT to Evolve and Adapt

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Databricks. A Primer

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Implement Hadoop jobs to extract business value from large and varied data sets

The Future of Data Management with Hadoop and the Enterprise Data Hub

Databricks. A Primer

Cost-Effective Business Intelligence with Red Hat and Open Source

HDP Hadoop From concept to deployment.

Next-Generation Cloud Analytics with Amazon Redshift

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

TOP 8 TRENDS FOR 2016 BIG DATA

<Insert Picture Here> Big Data

More Data in Less Time

Peninsula Strategy. Creating Strategy and Implementing Change

Big Data Success Step 1: Get the Technology Right

Hadoop Big Data for Processing Data and Performing Workload

MySQL and Hadoop. Percona Live 2014 Chris Schneider

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Has been into training Big Data Hadoop and MongoDB from more than a year now

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Technologies Compared June 2014

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

INTRODUCTION TO CASSANDRA

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Big Data for the Rest of Us Technical White Paper

Big Data Analytics - Accelerated. stream-horizon.com

How to Navigate Big Data with Ad Hoc Visual Data Discovery Data technologies are rapidly changing, but principles of 30 years ago still apply today

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Traditional BI vs. Business Data Lake A comparison

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

WHITE PAPER. Data Migration and Access in a Cloud Computing Environment INTELLIGENT BUSINESS STRATEGIES

WHITE PAPER USING CLOUDERA TO IMPROVE DATA PROCESSING

Hadoop & Spark Using Amazon EMR

Dashboard Engine for Hadoop

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Luncheon Webinar Series May 13, 2013

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Big data blue print for cloud architecture

Cloudera Enterprise Data Hub in Telecom:

Data Warehouse Hadoop. Shimpei Kodama 2015/9/29

Tiber Solutions. Understanding the Current & Future Landscape of BI and Data Storage. Jim Hadley

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

So What s the Big Deal?

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Information Architecture

Self-service BI for big data applications using Apache Drill

Transcription:

MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com

2

Understanding Big Data Buzzwords Big Data: lots of data Data Lake: an architecture strategy that handles Big Data Predictive Analytics: teaching a computer/database to learn and be able to answer questions that you don t yet know about (e.g. when you don t know what you don t know) Hadoop: a top level Apache project (open source) that provides Big Data storage & retrieval capabilities. Just one component in a Data Lake. Big Data Architect: a BI expert who lives in Silicon Valley 3

Big Data Requires New Architecture Structured Data Unstructured Data Today s World New World Applications = Dashboards of what happened yesterday, last week or last month with manual responses. Systems Integration and Application Development Applications = Predictions of what is going to happen in the future with automated responses. 4

Building Blocks Business Problem/Opportunity Technology Vendor(s) Data Storage Querying Presentation Governance Business Value 5

Business Problem/Opportunity If your business has no issues, you really don t need Big Data Big Data can be one of the most powerful enablers of growth at your company Are there business analysts that thrash with Data Mart queries? Can you access data from all sources, channels and systems easily? 6

Technology Vendors Big Data is complex Most companies get help from multiple Technology Vendors Cloud or On Premise Pick a Distribution Vendor Are you willing to create a Big Data development, project management and sustainment organizations? Big Data Manager/Director Architect Development Manager Operations Manager Engineering Manager 7

Data Storage HDFS NoSQL HBase MongoDB Cassandra RDBMS Oracle SQL Server MySQL On Premise vs. Cloud 8

Querying Profile Users Casual data consumer Business analyst Data Scientist Tools Map Reduce/Java Hive Pig Perl Python Key take away: leverage the skill sets that you already have within your organization 9

Presentation Profile Users Casual data consumer Business analyst Data Scientist Tools Tableau Splunk QlikView Traditional OLAP (e.g. MicroStrategy, Business Objects, Cognos) 10

Governance/Business Involvement IT cannot do it all by themselves These roles must be defined: Business Owners Business SMEs Metadata Management Product Owner During the first phases of build out, these roles will be nearly full time 11

Business Value Build a Business Case Identify an Executive Sponsor Define Success Criteria for each project/milestone After project is successful, Market your wins 12

Real World Examples Digital Advertising Pricing Manufacturing Traceability Enterprise Social Media analytics 13

Facebook Facebook developed and open sourced both Hive and Presto Still use Hive for reliable computations, large scale batch processing, optimized for overall system throughput Presto for interactive and distributed SQL - Presto runs 10x better than Hive at Facebook in terms of CPU efficiency and latency - Extended to query key engines in the Facebook data ecosystem - Cassandra, Hive, JMX, Kafka, MySQL, Postgres SQL, and others - HTTP + JSON support for various languages (Python, Ruby, PHP, and others) - Supports a large subset of ANSI SQL for complex queries, joins, aggregations, and various functions Presto is actively used by over a thousand Facebook employees, who run more than 30,000 queries processing over a petabyte daily. The reality is that at Facebook, we have a lot of data in our data warehouse, and other data spread across different systems, so being able to analyze the data across a single interface is very useful - Martin Traverso Facebook Engineer 14

Netflix The company s chief content officer said that Netflix uses really big data to pick which shows to produce and how to promote them. Tata Consultancy Services surveyed 1,217 executives from large companies (revenue of more than $1 billion) in a dozen global industries in North America, Europe, Asia-Pacific, and Latin America. They found that companies with huge investments in Big Data are generating excess returns and gaining competitive advantages, putting companies without significant investments in Big Data at risk. Hadoop use cases at Netflix break out along the lines of: Batch jobs ETL, reporting, and analytics (Pig, Hive) Interactive jobs (Presto) We are especially focused on performance and ease of use, with initiatives including Presto integration, Spark, and our Big Data Portal and API. Kurt Brown, Netflix 15

Airbnb Airbnb has 23 data scientists using Hive and >500 users across the business Migrated workloads that were historically done on Amazon Redshift to: Reduce the effort associated with ETL Overcome limitations of international character sets Open sourced Airpal in March 2015 a web-based data exploration and SQL query interface for Presto Presto is an integral part of the Airbnb data infrastructure stack with hundreds of employees running queries each day with the technology. We are excited to see Teradata joining the Presto open source community and are encouraged by the direction of their contributions. - James Mayfield, product lead, Airbnb. 16

Device Data Analytics The number and types of devices that are emitting data is growing at an unprecedented rate. Every second, these devices generate Big Data. Data Analytics solutions can enable companies to use this data to analyze and improve processes, predict trends and failure. The data can also provide insight for product development and sales teams who use the information to improve their features and increase revenues, respectively. ROI 17

Device Data Analytics 18

Appendix 19

Sample Detailed Architecture 20

Data Lake Use Cases Analytical Sources Transformation Discovery Archive Reporting Modeling Data Hub 21 Operational & Analytical Sources Other Sources

Foundation of a managed Data Lake - Metadata Reservoirs require the presence of metadata that enable non subject matter experts to easily know the location of and entitlements to the various forms of stored data within. Schema Metadata is always a given, but Operational Metadata & Business-Security is critical to governance Where did it come from? What is the data serialization? Who owns the data. Who can see the data. Who belongs to which group What is the environment? landing zone, OS, Line of Business What processes touched my data? Did you lose any data? Checksums When did the data get ingested, transformed? Did it get exported, or archived when, where how will it be used (organizational)? How are you tracking lineage - Checkpoints etc. 22