BIG DATA: BIG OPPORTUNITY OR BIG HEADACHE? Peter Dorrington SAS
FIRST, A FEW WORDS ABOUT SAS (Who do, after all, pay my salary) (Post conference narrative annotations to this presentation are in green italics)
2011 PERFORMANCE A LEADING PROVIDER OF ADVANCED ANALYTICS SOFTWARE For 37 years, we have focused on giving our customers 12% growth in total revenue over 2011 36 consecutive years of revenue growth 24% of 2011 revenues invested into R&D Being privately owned means we can afford to reinvest in R&D, not focus on quarterly share price / dividends
WHY DOES SAS CARE ABOUT BIG DATA?
THE VISION WE HAVE ALWAYS UNDER-PINNED DECISION-MAKING Organizations are inundated with data terabytes and petabytes of it. To put it in context, 1 terabyte contains 2,000 hours of CD-quality music and 10 terabytes could store the entire US Library of Congress print collection. Exabytes, zettabytes and yottabytes definitely are on the horizon. The hopeful vision of big data is that organizations will be able to harvest and harness every byte of relevant data and use it to make the best decisions. Big data technologies not only support the ability to collect large amounts, but more importantly, the ability to understand and take advantage of its full value. This is the vision reality is somewhat different
BUT DO YOU REMEMBER THIS? SINGLE CUSTOMER VIEW (SCV) A complete SCV is not currently available in any of the interviewed organisations. Most have a partial implementation of some of the data and / or some of the channels... From a study this year When I joined SAS UK as Head of CRM in 2000, this was already old news. Over a decade later, with all the advances in data management and analytics, it is still an issue. The danger is that Big Data will make the challenge greater by adding new data sources and aspirations before we have fully got to grips with our current reality. A Market Study by Henley Business School in association with SAS UK and Ireland
OUR PERSPECTIVE BIG DATA IS RELATIVE, NOT ABSOLUTE Big Data is When volume, velocity and variety of data exceeds an organization s storage or compute capacity for accurate and timely decision-making. The explosion of data isn t new. It continues a trend that started in the 1970s. What has changed is the velocity of growth, the diversity of the data and the imperative to make better use of information to transform the business. Big data is really just more data, from more sources. Most organizations already have large data. (I regularly use Companies House data of 5.3 millions rows; far more than Excel can deal with. Some of our customers are using 5.3 billion rows of data and doing so very effectively)
BIG DATA SOURCES BIG DATA IS EVERYWHERE Which of the following data types are you collecting as Big Data and/or using today? Structured data ( tables, records ) It s happening already, a significant challenge will be in working out how to manage all these sources Semi-structured data ( XML and similar standards ) Complex data ( hierarchical or legacy sources ) Event data ( messages, usually in real time ) Unstructured data ( human language, audio, video ) Social media data ( blogs, tweets, social networks ) Web logs and click streams Spatial data ( long / lat coordinates, GPS output ) Machine-generated data ( sensors, RFID, devices ) Scientific data ( astronomy, genomes, physics ) Other Based on 450 responses from 109 respondents who report practicing Big Data analytics; 4.1 responses per respondent on average. Source: TDW I Big Data Analytics Report, 4 th Quarter 2011, Philip Russom
THE SCALE OF THE CHALLENGE It s not hard to imagine a future of super-cheap, ubiquitous, connected chips with everything ; the data growth curve is potentially exponential. Will the future be Even Bigger Data? But how much of this data is going to be useful in any given context? Source: IDC Digital Universe Study, sponsored by EMC, May 2010
DATA SIZE SO WHAT? NOT ALL DATA IS EQUAL VOLUME VARIETY - terabytes, petabytes and up - from all kinds of sources VELOCITY VARIABILITY COMPLEXITY - and often without context or clear value - some historic, others real-time in fits-and-starts, as well as - smooth flowing & of also dubious quality The challenge is to find relevance from within this data deluge TODAY THE FUTURE
IMPLICATION WE WILL NEED TO RETHINK DATA MANAGEMENT From standalone disciplines to integrated processes Where data integration, data quality, metadata management and data governance are designed and used together. The traditional extract-transform-load (ETL) data approach augmented with one that minimizes data movement and improves processing power. - There is no meaningful way we can store all this data (with today s technologies), never mind build an OLAP cube from it. - For example, the Large Hadron Collider at CERN products 15 petabytes of data per year: they can only store a subset of this and that only by distributing the storage around the world using multiple hubs. - Now add real-time data feeds into the mix
BIG DATA & ANALYTICS Data without analysis has only transactional value
Reactive Analytics Predictive Analytics BIG DATA, BI AND ANALYTICS TRADITIONAL VIEW MY DEFINITIONS: Predictive (Proactive) Analytics: Big Analytics Optimisation - How do we do things better? What is the best decision? Predictive Modelling - What will happen next? How will it affect me? Forecasting - What if the trend(s) continue? Statistical Analysis - Why is it happening? What am I missing? Reactive Analytics (Business Intelligence): Business Intelligence (BI) Large Data Pretty much all organisations have large data Alerts - When should I react? What action is needed now? Query Drilldown (OLAP) - Where exactly? How do I find the answers? Ad Hoc Reports - How Many? How Often? Standard Reports - What happened? when? All have a role to play
Reactive Analytics Predictive Analytics BIG DATA, BI AND ANALYTICS WHAT CHANGED? Big Analytics Big Data Analytics Business Intelligence (BI) Big Data BI Large Data Big Data Not much has changed when moving from Large Data to Big Data : BI is still BI, Analytics is still Analytics applying BI to Big Data does not make it inherently analytical
OUR PERSPECTIVE BIG DATA ANALYSIS: A PROMISE AS YET ONLY PARTIALLY FULFILLED / ADOPTED There s gold in them thar hills The true value of big data lies not just in having it, but in harvesting it for fast, fact-based decisions that lead to real business value. - Just like mining for gold (a deliberate pun about data mining) you have to work for the reward, it is rarely found just lying on the surface and if it was it wouldn t be rare and therefore valuable. - The problem with low hanging fruit is that everyone can see it and reach for it your competition included. Your unique Intellectual Property (what you know, and what you know about what you know) may be the only thing that ultimately sets you apart.
SO WHAT S STOPPING YOU? 10 ROADBLOCKS TO IMPLEMENTING BIG DATA ANALYTICS All of these are solvable 1. Budget 2. IT know-how 3. Business know-how 4. Data clean-up 5. The storage bulge 6. New data centre workloads 7. Data retention - Develop a plain English business case with value 8. Vendor role clarification 9. Business and IT alignment 10. Developing new talent - Figure out what you need to do, then what capabilities are needed & how obtained - Partner with those who do have the skills - Face up to the problem & prepare to invest - Store only what you have to or is relevant - Monitor & analyze workloads and Plan accordingly - (see Storage Bulge) - Identify who can offer more than canned analyses / reports Partner - If at all possible, develop in-house talent using consistent architectures, rather than buy-in skills - Design a strategy around business, not IT goals & objectives - Mary Shacklett. TechRepublic, Nov 2012
HEALTH WARNING BOYD: BRING YOUR OWN DATA? FINANCE DIRECTOR SALES DIRECTOR OPERATIONS DIRECTOR How many views of how many data sources, using how many tools on how many devices? - Imagine what would happen if the whole leadership team turns up to a meeting with their own sets of data - Implement a strategy that provides a consistent data strategy / foundation - Bring Your Own View of one set of Data
THE VALUE OF BIG DATA
OPPORTUNITY OR THREAT? WHAT BUSINESS LEADERS SAY ABOUT BIG DATA Should probably ask strength or weakness? - Opportunities / Threats are often external not under our control - Strengths/Weaknesses are internal we decide where we want to be strong - Perhaps the internal debate should be able how the value of big data can provide an organisation with new strengths - In particular, proprietary IP based on data is very hard for competitors to replicate, whereas products typically are not.
EXAMPLE MARKETING & CUSTOMER ACQUISITION Same / better result for less investment - This has been going on for years: by understanding customers / segments better, we can focus our investment on just those most likely to respond, this lift in response rates improves RoI - Big data has the potential to know more about customers & develop better models for more customers
THE POTENTIAL THE UK S CORPORATE GOLD
IMPLICATION IMPACT EVERY PAGE OF THE ANNUAL REPORT Cut losses from fraud by 30% in retail banking Improved retention rates by 40~%, and increase product holding by customers by 10%. (Retail) Increased the number of customers by 1.7m pa assisting to a 15% compound annual growth rate in just 2 years Increased sales by 40% by identifying customers sales, and matching the best salespeople to close the opportunity. Increased customer purchase by 65% through data integration and effective targeting. Maintain bad debt of <0.05%, compared to the industry norm of 3.45%. Reduced number of financial reports by 82% - providing key fiscal information for rapid decision making. The applications of analytics to address business challenges / opportunities are not restricted to just one function
WHAT IS YOUR ISSUE? Act Time MARKET OPPORTUNITY Decide Orient Observe - When you need to be able to reach a decision & act faster than the competition - OODA - Colonel John Boyd Confidence - When you need to consider lots of scenarios 100% Sampling - When you need to see the whole picture, not just a sample of it
HOW TO BIG DATA ANALYTICS (Based on SAS technologies)
THE ANALYTICS LIFECYCLE THERE IS STILL A PLACE FOR A STRUCTURED APPROACH BUSINESS MANAGER Domain Expert Makes Decisions Evaluates Processes and ROI DEPLOY MODEL EVALUATE / MONITOR RESULTS IDENTIFY / FORMULATE PROBLEM How can we create strategic advantage? DATA PREPARATION DATA EXPLORATION BUSINESS ANALYST Data Exploration Data Visualization Report Creation IT SYSTEMS / MANAGEMENT VALIDATE MODEL BUILD MODEL TRANSFORM & SELECT DATA MINER / STATISTICIAN Model Validation Model Deployment Model Monitoring Data Preparation Exploratory Analysis Descriptive Segmentation Predictive Modeling In my opinion, you start out with identifying what question you need an answer to
DISTRIBUTED COMPUTING Almost all Big Data solutions run in grid environments chunking up the task to share across many processors
IN-DATABASE ANALYTICS Doing the analytics in the database keeps it close to the data and in an easily managed environment
IN-MEMORY ANALYTICS ARCHITECTURE But doing analytics in-memory allows for vast improvements in speed & enables train of thought development of new questions / answers
USAGE EXAMPLES
WHAT IF YOU COULD... predict the buying behavior and decision criteria of your prospects weeks before your competition... gain first-mover advantage by introducing new products and services to micro-segments that haven't been identified by anyone... evaluate the impact of your marketing campaigns hourly and make adjustments in real-time... Improve customer experience scores that grow products per customer, reduce attrition, and leverage the power of customer recommendations for new business
RETAIL Big, general purpose retailers have 10,000s of SKUs across tens of stores having the right amount / mix of stock, at the right price is critical to protecting (slim) margins. The challenge is to adjust pricing as quickly as the market changes not monthly or weekly, but daily, or even hourly.
TELCO Two big issues: the market is saturated (very few new customers) and is commoditized (customers driven by price and customer experience ). Network failures directly impact the latter whereas just providing the infrastructure does not make money. Some Telco's are looking at their IP and working out how they can use it to grown new revenue streams
HEALTH CARE In a recent case, the DNA of MSRA bacterium was sequenced in 48 hours for a cost of 50; we are much closer to personalised health plans than many would think. Even leaving the genetic issue to one side, it is possible to use analytics to predict healthcare needs and therefore opportunities to intervene before the chronic becomes acute.
BANKING If you have ever had a credit card transaction declined, then you will know that the card issuers are working hard to identify 100% of the potential fraud, whilst at the same time not generating false positives declining genuine transactions because the detection models are incomplete or unresponsive to individual consumer behaviour is bad for business
PUBLIC SAFETY Lots of what goes on in this sector is kept, quite rightly, under wraps but there are case studies from all over the world where police forces are starting to anticipate where crime hotspots are/will develop and fix policing strategy accordingly
INSURANCE Telematics in cars for insurance is already available in the UK. Because insurers get a better picture of individual driving patterns they can adjust their risk calculations accordingly and offer individual (and competitive) prices to better / safer drivers
FINANCIAL SERVICES Risk is at the heart of all financial services; banks and insurers just need to know how to price it correctly. In the example of stress testing banks are now asked to consider the impacts of a wide range of scenarios on their business. The ability to run lots and lots of different risk scenarios directly impacts price and tactically allows more responsiveness - heading off problems before they become unmanageable
UTILITIES An incredible commoditised, mature, competitive sector leveraging IP is one way it is responding
IN CONCLUSION
ADOPTING BIG DATA ANALYTICS IS NOT WITHOUT CHALLENGES Source: The Current State of Business Analytics: Where Do We Go From Here? Prepared by Bloomberg Businessweek Research Services, 2011
BUT PLENTY TO GET EXCITED ABOUT! Problems cannot be solved by the same level of thinking that created them. - Albert Einstein Open Data The power to analyse more Lots and lots of solutions.... Framing the problem Knowledge systems Interpretation of data
FURTHER READING http://www.sas.com/reg/wp/corp/46345
THANK YOU! peter.dorrington@sas.com www.sas.com