Success Story: Big Data Drives Profits. Brett Farrar. Founding Partner. Sendero Business Services. 2013 ATC Fall Conference

Similar documents
Apache Hadoop's Role in Your Big Data Architecture

Sources: Summary Data is exploding in volume, variety and velocity timely

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Delivering new insights and value to consumer products companies through big data

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Industry Impact of Big Data in the Cloud: An IBM Perspective

So What s the Big Deal?

Big Data Comes of Age: Shifting to a Real-time Data Platform

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

NextGen Infrastructure for Big DATA Analytics.

The Next Wave of Data Management. Is Big Data The New Normal?

Deploying Big Data to the Cloud: Roadmap for Success

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data. Fast Forward. Putting data to productive use

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Getting Started Practical Input For Your Roadmap

HDP Enabling the Modern Data Architecture

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

Big Data Zurich, November 23. September 2011

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Are You Ready for Big Data?

Big Data + Predictive Analytics = Actionable Business Insights: Consider Big Data as the Most Important Thing for Business since the Internet

How To Use Social Media To Improve Your Business

Leading the way with Information-Led Transformation. Mark Register, Vice President Information Management Software, IBM AP

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Are You Ready for Big Data?

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

The Future of Data Management

APICS INSIGHTS AND INNOVATIONS EXPLORING THE BIG DATA REVOLUTION

What happens when Big Data and Master Data come together?

BIG DATA. - How big data transforms our world. Kim Escherich Executive Innovation Architect, IBM Global Business Services

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

How To Handle Big Data With A Data Scientist

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Raul F. Chong Senior program manager Big data, DB2, and Cloud IM Cloud Computing Center of Competence - IBM Toronto Lab, Canada

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Grabbing Value from Big Data: The New Game Changer for Financial Services

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Research Note What is Big Data?

What is a Petabyte? Gain Big or Lose Big; Measuring the Operational Risks of Big Data. Agenda

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

ANALYTICS BUILT FOR INTERNET OF THINGS

Data Refinery with Big Data Aspects

Intro to Big Data and Business Intelligence

The Big Deal about Big Data. Mike Skinner, CPA CISA CITP HORNE LLP

Innovation Session BIG DATA. HP EMEA Software Performance Tour 2014

Beyond Watson: The Business Implications of Big Data

Business white paper The disruptive power of big data

How To Use Big Data To Help A Retailer

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Tap into Hadoop and Other No SQL Sources

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

OnX Big Data Reference Architecture

A financial software company

How To Scale Out Of A Nosql Database

Big Data and Healthcare Payers WHITE PAPER

NoSQL for SQL Professionals William McKnight

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Turning Big Data into a Big Opportunity

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

In-Memory Analytics for Big Data

BEYOND BI: Big Data Analytic Use Cases

Bringing Big Data into the Enterprise

Big Data Er Big Data bare en døgnflue? Lasse Bache-Mathiesen CTO BIM Norway

The disruptive power of big data

There s no way around it: learning about Big Data means

Big Data Analytics for Retail with Apache Hadoop. A Hortonworks and Microsoft White Paper

Dell Information Management solutions

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Decisyon/Engage. Connecting you to the voice of the market. Contacts.

HP Vertica at MIT Sloan Sports Analytics Conference March 1, 2013 Will Cairns, Senior Data Scientist, HP Vertica

Unleashing the Potential of your Social Media and CRM Data. Markus Hirsch Sales Manager

The big data revolution

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

How the oil and gas industry can gain value from Big Data?

Interactive data analytics drive insights

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Big Analytics: A Next Generation Roadmap

Transcription:

2013 ATC Fall Conference Success Story: Big Data Drives Profits Brett Farrar Founding Partner Sendero Business Services

Success Stories Big Data Drives Profits October 15, 2013

Information Timeline Part I ~50,000-100,000 BC Spoken language develops ~4000 BC Written word is developed 1440 Johannes Gutenberg invents the printing press process 1775 US Postal Service begins 1888 Richard Sears first used a printed mailer 1936 First freely programmable computer 1961 First database invented (IDS) 1970 Relational database first defined 1981 IBM PC introduced 1984 Apple Macintosh introduced 1985 Microsoft Windows introduced 1988 IBM article coins the term business data warehouse 1989 World Wide Web first proposed 1994 Introduction of cookies allows for internet tracking 1997 Term Big Data used by NASA researchers 2005 Hadoop was created 2011 IBM s Watson computer uses Big Data to beat human competitors on Jeopardy 2012 2.5 Exabytes of data are created each day. This number doubles every 40 months 15 October 2013 3

Sears Roebuck & Co. Story Sears Becomes a Brand Powerhouse In 1888, Richard Sears first used a printed mailer (i.e., a catalog) to sell and market the products he was offering in Sears stores. Sears used the catalog to introduce themselves to Americans and to become a phenomenally successful company. Americans looked forward to getting their annual Sears catalog. Key Enablers Written word Printing press Ubiquitous and cheap mail service Observations Organizational commitment and insights came from the top-down The key enablers were in place for over 100 years before a visionary took full advantage Catalog was risky and a big investment 15 October 2013 4

Information Timeline Part II ~50,000-100,000 BC Spoken language develops ~4000 BC Written word is developed 1440 Johannes Gutenberg invents the printing press process 1775 US Postal Service begins 1888 Richard Sears first used a printed mailer 1936 First freely programmable computer 1961 First database invented (IDS) 1970 Relational database first defined 1981 IBM PC introduced 1984 Apple Macintosh introduced 1985 Microsoft Windows introduced 1988 IBM article coins the term business data warehouse 1989 World Wide Web first proposed 1994 Introduction of cookies allows for internet tracking 1997 Term Big Data used by NASA researchers 2005 Hadoop was created 2011 IBM s Watson computer uses Big Data to beat human competitors on Jeopardy 2012 2.5 Exabytes of data are created each day. This number doubles every 40 months 15 October 2013 5

Walmart Story Walmart Becomes the Low Cost King In the 1970's and 1980 s, Sam Walton implemented radical cost-cutting by partnering with the ITsavvy executive Roy Mayer and his data processing protégé, Royce Chambers. Together, they overhauled the company s logistics and upgraded the computer system that tracked merchandise sales and orders. They also partnered with their suppliers to share information and further cut their supply chain costs. They are now the largest company in the world. Key Enablers Computers / computing power Databases for OLTP Communications and partnering with suppliers Observations Organizational commitment and insights came from the top-down They were storing, tracking, exchanging, and using big data before the term was ever coined Big investment in time and money before payoff 15 October 2013 6

American Airlines Story American Airlines Leads the Way in Advanced Analytics Robert Crandall, the former CEO of American Airlines, is credited with developing their frequent flyer program, Sabre reservations, and pioneering their Operations Research and Advanced Analytics group. Consequently, AA has an organizational commitment to data capture and analysis. Their use cases for data analytics include parts allocation/location based on demand prediction, market size forecasting, supply chain optimization, plane configuration, etc. Key Enablers Computers / computing power Centralized databases for OLTP and OLAP Data from third parties People dedicated to asking why and how Observations Organizational commitment and insights came from the top-down Rotable parts allocation resulted in savings of several million dollars per year for one fleet Big investment in time and money before payoff 15 October 2013 7

Large Gaming Retailer Story What Information are You Missing? This company needed to gain more insights into their customers. So, they implemented a loyalty program to incent their customers to share demographic information, contact information, preferences, etc. Before the program this company thought their customers were male, 20- something gamers. Afterwards, they realized their best customers were really middle income families with children. This caused them to change their marketing, store locations, game mixtures, etc. and allowed them to increase their revenue and market share. Key Enablers Computers / computing power Databases for OLTP and OLAP Asking what other data is needed Observations Organizational commitment and insights came from the top-down Big investment in time and money before payoff 15 October 2013 8

Information Timeline Part III ~50,000-100,000 BC Spoken language develops ~4000 BC Written word is developed 1440 Johannes Gutenberg invents the printing press process 1775 US Postal Service begins 1888 Richard Sears first used a printed mailer 1936 First freely programmable computer 1961 First database invented (IDS) 1970 Relational database first defined 1981 IBM PC introduced 1984 Apple Macintosh introduced 1985 Microsoft Windows introduced 1988 IBM article coins the term business data warehouse 1989 World Wide Web first proposed 1994 Introduction of cookies allows for internet tracking 1997 Term Big Data used by NASA researchers 2005 Hadoop was created 2011 IBM s Watson computer uses Big Data to beat human competitors on Jeopardy 2012 2.5 Exabytes of data are created each day. This number doubles every 40 months 15 October 2013 9

Characteristics of Big Data Big Data is usually characterized by some common attributes. Although, not all of these characteristics have to exist at once in a single dataset. Volume o Lots of data! o Petabytes and exabytes Velocity o Speed at which it is collected o Constantly adding new data at a high rate o Machine sensor data, Internet data, web-site customer behavior data, web logs, etc. Variety o Any type of data o Structured data (e.g., database data, CSV files, etc.), unstructured data (e.g., video files, audio files, blog entries, twitter feeds, etc.), and semi-structured data (e.g., log files) 15 October 2013 10

Implications of Big Data The characteristics of Big Data force us to use new ways to store, manage, search, and retrieve data. Traditional / centralized datastores (e.g., Relational databases) are not optimized to handle the characteristics of Big Data o Data is typically limited, minimized, and archived when using centralized datastores because the performance suffers with large volumes of data o Centralized datastores are made to handle structured data where the format/structure is dictated to the datastore before the data is written into it. Therefore, they do not handle unstructured or semi-structured data in a way that allows the data to be searched easily Decentralized datastores (e.g., Hadoop file system) o Developed to handle the characteristics of Big Data o Hadoop has become the most popular decentralized datastore o Hadoop is an open-source project that was developed at Yahoo and is still used extensively at Yahoo to deliver information to customers 15 October 2013 11

How Big is Big Data? 152 million physical items 838 miles of bookshelves = 1 L.O.C. 10 terabytes of printed data 15 October 2013 12

How Big is Big Data? 400 million tweets per day 4 terabytes per day Twitter 2.5 days per L.O.C. 15 October 2013 13

How Big is Big Data? 1 million+ customer transactions per hour 2.5 petabytes of data in storage 250 L.O.C. s 15 October 2013 14

How Big is Big Data? 4.7 billion searches per day in 2011 20 petabytes of data processed per day in 2008 2,000 L.O.C. s of data processed per day in 2008 15 October 2013 15

How Big is Big Data? 42 zettabytes of words ever spoken by human beings 45,000,000,000 L.O.C. s of words ever spoken by human beings 15 October 2013 16

Centralized vs. Decentralized Datastores Centralized Compute & Storage Separate Decentralized Compute & Storage Together Compute (CPU & Memory) Compute (CPU & Memory) and Storage Storage 15 October 2013 17

Centralized vs. Decentralized Datastores Centralized Vertical Scaling Decentralized Horizontal Scaling Implications RDBMS scalability limited by internal capacity RDBMS SPOF require high end HW and SW Big Data scales by adding servers Big Data leverages commodity servers 15 October 2013 18

Centralized vs. Decentralized Datastores Centralized Decentralized Implications Both prevent data loss due to failures Big Data can shift workload to under-utilized servers All queries go through single RDBMS engine 15 October 2013 19

Centralized vs. Decentralized Datastores Centralized Data to the Query Decentralized Query to the Data Query Results Query Results 1 3 1 2 2 Raw Data Implications RDBMS sends large raw data across a network connection, thus slowing performance Big Data minimizes network traffic by sending the query to the data and compute node 15 October 2013 20

Centralized vs. Decentralized Datastores Centralized Decentralized Amount of Data Amount of Data Scaling Scaling Implications RDBMS has diminishing performance improvements as more data is added; more money is spent to try and maintain performance Big Data performance scales almost linearly with investments and amount of data 15 October 2013 21

Centralized vs. Decentralized Datastores Centralized Structure at Store Time Decentralized Structure at Read Time Extract Transform (Apply Structure) Load Extract Load Read & Transform (Apply Structure) Implications RDBMS requires you to know how data will be used/classified at collection time Big Data more quickly and easily allows new insights to be tested because the structure is not applied until read time 15 October 2013 22

Implications of a Decentralized Datastore Because of the architecture provided by a decentralized database and the use of commoditized servers, new things are possible that were not possible previously Cost per Terabyte goes down by a factor of 10+ o Economics now allows for the saving and analysis of trending/historical data that was previously discarded o Can now save and analyze external data and data that was never saved before Can store, process, search, and retrieve any kind of data (unstructured, structured, or semi-structured) o Allows exploration of data that was unavailable or too difficult/costly before o Allows mashups of unstructured or semi-structured data with structured data (e.g., data from transactional processing systems) Fail fast o Centralized databases require development of ETL scripts that require significant design, development, and testing time/cost by IT resources; therefore, it took a long time and a lot of money to validate hypotheses o Hypotheses can be validated much more quickly and quite often without IT involvement 15 October 2013 23

Travel Industry Success Story Comparing Airline Flight Fares A small web-based company (35 employees) in the travel industry that provides comparisons of airlines flight fares across carriers. They started an IT-driven initiative 18 months ago with a small Hadoop cluster (5 nodes) to analyze user behavior data on their website. This analysis helped to improve their SEO results (increasing their traffic by over 100%). It was so successful that the Business partnered with IT to build a second, much larger Hadoop cluster (50 nodes) to combine (1) fare data, (2) flight schedule data, and (3) seat availability data. This has greatly improved the quality of data to their customers, and improved their customer satisfaction. The company has now re-organized to better align with their new data-focused strategy. Key Enablers Hadoop Data from third parties People dedicated to asking why and how Partnering across the organization Fail fast concept Observations Organizational commitment and insights came from the bottom-up Small investment in time/money for pilot; started small and cheap to get quick wins This is a small company (Big Data not just for big companies) 15 October 2013 24

ETL Modernization Business Problem Extract-Transform-Load (ETL) jobs taking too long to run and exceeding the batch processing window ETL jobs not completing successfully and preventing business critical data from being available where it is needed Solution Hadoop solution implemented to replace some (or all) of the offending ETL jobs Nothing changes with the source systems (e.g., transactional processing systems) and destination systems (e.g., Enterprise Data Warehouse) Data movement and transformation from source systems to destination systems speeds up tremendously (quite often by a factor of 2-3+ times) Easy use case that quickly achieves projected ROI and cost justifies bigger and more aspirational use cases Great way to start doing Hadoop / Big Data and to build skills 15 October 2013 25

Risk Mitigation Business Problem Companies with critical historical data in proprietary format or running critical proprietary applications where the supporting vendor is no longer in business or the application is no longer supported These companies are at risk if they run into an issue with the data or application and can not get the support required to resolve their issues The data for these applications are stored on high-end, expensive storage devices (e.g., SAN) to ensure its availability Solution Hadoop solution implemented to store the data in a much more cost-effective way (can be as little as 1/10 to 1/3 the cost) Data can now be searched in new ways Easy use case that quickly achieves projected ROI and cost justifies bigger and more aspirational use cases Great way to start doing Hadoop / Big Data and to build skills 15 October 2013 26

Supply Chain and Logistics Business Problem Manufacturers need just-in-time availability of components Stock-outs cause harmful production delays Sensors and RFID tags reduce the cost of capturing more supply chain data, which needs storage and processing Solution Big Data architecture stores unstructured, streaming, dirty sensor data Manufacturers get lead time to make alternative arrangements for supply chain disruptions Prevent stock-outs, reduce supply chain costs and improve margins for the finished product 15 October 2013 27

Assembly Line Quality Assurance Business Problem High-tech manufacturing uses sensors to capture data at critical steps in the manufacturing process Sensor data helps diagnose errors with returned products Much data is discarded, because of high storage costs Lean margins mean small budgets for data analysis Solution Big Data architecture stores unstructured, streaming, dirty sensor data Manufacturers can proactively analyze more data, over a longer time, to detect subtle issues otherwise undetected Sensor data managed with a Big Data architecture can help a manufacturer reduce warranty costs and earn a reputation for quality 15 October 2013 28

Proactive Maintenance Business Problem Today s manufacturing workflows involve sophisticated machines coordinated across pre-defined, precise steps One machine malfunction can stop the production line Premature maintenance has a cost; there is an optimal schedule for maintenance: not too early, not too late Solution Big Data architecture stores unstructured, streaming, machine data Manufacturers can derive optimal maintenance schedules, based on real-time information and historical data Maximize equipment utilization, minimize P&E expense, and avoid surprise work stoppages 15 October 2013 29

Crowdsourced Quality Assurance Business Problem Thoroughly tested products still have post-sale problems Customers may not report problems to the manufacturer, but still complain about the product using social media This social stream of data on product issues can augment product feedback from typical support channels Solution Big Data architecture stores huge volumes of social media sentiment data Manufacturers can mine this data for early signals on how a product holds up after delivery to the customer Learn about issues quickly and take early action to protect the product reputation and win customer loyalty 15 October 2013 30

Other Use Cases 15 October 2013 31

The Big Data Landscape v1 15 October 2013 32

The Big Data Landscape v2 15 October 2013 33

Summary Big Data only describes a problem or situation. To ensure that you get the most value out of your Big Data projects, you need to do the following: Craft a goal-oriented plan with quick wins Fail fast Use the data you already have Don t limit yourself to only the data that you have Partner effectively across organizations Commit to operating business differently Ask why and what else Choose the right tool in your toolbox (RDBMS, Hadoop, etc. all have their place) 15 October 2013 34

Q&A E-Mail: Bret.Farrar@SenderoCorp.com 15 October 2013 36