Big Data Analytics. David Dietrich, EMC Education Services. April 4, 2013



Similar documents
DST4L Class Notes: April 4, 2013 Presenter: David Dietrich

SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

Predictive Analytics: Turn Information into Insights

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Big Data lisää älyä tiedosta

BIG DATA I N B A N K I N G

Joachim Worf Sr. Education Delivery Manager EMC Corporation

Extend your analytic capabilities with SAP Predictive Analysis

Big Data Use Cases Update

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

The Enterprise Data Hub and The Modern Information Architecture

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

BIG DATA STRATEGY. Rama Kattunga Chair at American institute of Big Data Professionals. Building Big Data Strategy For Your Organization

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

The Future of Data Management with Hadoop and the Enterprise Data Hub

SAP Predictive Analytics

Integrating a Big Data Platform into Government:

Voice. listen, understand and respond. enherent. wish, choice, or opinion. openly or formally expressed. May Merriam Webster.

How To Understand The Benefits Of Big Data

Banking Analytics Training Program

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

The Future of Data Management

Solve your toughest challenges with data mining

APPROACHABLE ANALYTICS MAKING SENSE OF DATA

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

The 4 Pillars of Technosoft s Big Data Practice

CONNECTING DATA WITH BUSINESS

Galaxy BI Consulting Services. Listening to Business, Applying Technology

Advanced Analytics. The Way Forward for Businesses. Dr. Sujatha R Upadhyaya

Are You Ready for Big Data?

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

SAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics

Data Analytical Framework for Customer Centric Solutions

Real World Application and Usage of IBM Advanced Analytics Technology

Are You Ready for Big Data?

Some Economics of Cultural PSI: the Micro Perspective

Business Process Services. White Paper. Predictive Analytics in HR: A Primer

Social Business Intelligence For Retail Industry

A New Era Of Analytic

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Industry Impact of Big Data in the Cloud: An IBM Perspective

The Canadian Realities of Big Data and Business Analytics. Utsav Arora February 12, 2014

SAP Predictive Analysis: Strategy, Value Proposition

> Cognizant Analytics for Banking & Financial Services Firms

Big Data: Key Concepts The three Vs

IRMAC SAS INFORMATION MANAGEMENT, TRANSFORMING AN ANALYTICS CULTURE. Copyright 2012, SAS Institute Inc. All rights reserved.

Getting Started Practical Input For Your Roadmap

Solve Your Toughest Challenges with Data Mining

Data-Driven Decisions: Role of Operations Research in Business Analytics

WHITEPAPER. Creating and Deploying Predictive Strategies that Drive Customer Value in Marketing, Sales and Risk

BEYOND BI: Big Data Analytic Use Cases

Revenue Enhancement and Churn Prevention

Big Data. Fast Forward. Putting data to productive use

High-Performance Analytics

Self-Service Big Data Analytics for Line of Business

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

Working with telecommunications

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

BIG Data Analytics Move to Competitive Advantage

Deploying Big Data to the Cloud: Roadmap for Success

Apache Hadoop Patterns of Use

How To Use Social Media To Improve Your Business

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Analyzing Big Data: The Path to Competitive Advantage

New Clinical Research & Care Opportunities Through Big Data Informatics

This Symposium brought to you by

Apigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps

Hurwitz ValuePoint: Predixion

Big Data and Trusted Information

Using Predictive Analytics To Drive Workforce Optimization. New Insights From Big Data Analysis Uncover Key Drivers of Workforce Profitability

How To Create A Data Science System

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Discover How a 360-Degree View of the Customer Boosts Productivity and Profits. eguide

Big Data / FDAAWARE. Rafi Maslaton President, cresults the maker of Smart-QC/QA/QD & FDAAWARE 30-SEP-2015

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Building and Deploying Customer Behavior Models

Statistical Challenges with Big Data in Management Science

How To Understand Business Intelligence

Big Data Executive Survey

BUY BIG DATA IN RETAIL

BIG DATA. - How big data transforms our world. Kim Escherich Executive Innovation Architect, IBM Global Business Services

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Big Data and Analytics in Government

Big Data Analytics- Innovations at the Edge

M2M Analytics: A New Wave of Innovation

Turn your information into a competitive advantage

Blueprints and feasibility studies for Enterprise IoT (Part Two of Three)

Beyond Traditional Management Reporting IBM Corporation

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Banking On A Customer-Centric Approach To Data

IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014

Johan Hallberg Research Manager / Industry Analyst IDC Nordic Services & Sourcing Digital Transformation Global CIO Agenda

How To Get An Advantage From Analytics

Personalized Customer Experience Management

Big Analytics: A Next Generation Roadmap

Achieving Business Value through Big Data Analytics Philip Russom

Transcription:

Big Data Analytics Harvard-Smithsonian Center for Astrophysics Data Science Training for Librarians April 4, 2013 David Dietrich, EMC Education Services

I ll go into a company and say, What data problems can we solve? We get blank looks, he says. when he asks, instead, what things can help a company lose money and make money, usually two out of three are problems that data can solve. Anthony Goldbloom, CEO of Kaggle 2

Agenda In other words Level setting on Big Data Emerging Need for Advanced Analytics Tools, Technology, & Skill Development 3

4

5

Using Social Graphs to Map the Spread of Innovation Ideas

Examples of Big Data Analytics 7

Big Data Analytics: Industry Examples Health Care Reduce time needed to detect pandemics, provide vaccines where they are needed most Telecommunications Financial Medical Improve customer churn prediction with social media data Government Industry Verticals Internet Financial Services Phone/TV Retail Accelerate lending decisions using big data sources

Big Data Analytics: Industry Examples Health Care Reduce time needed to detect pandemics, provide vaccines where they are needed most Telecommunications Financial Medical Improve customer churn prediction with social media data Government Industry Verticals Internet Financial Services Phone/TV Retail Accelerate lending decisions using big data sources

Big Data Improves Healthcare Traditional Approach to Distributing Vaccines Traditional Approach Challenges with Traditional Approach Vaccine distribution usually based on: Regional population First-come, first-served Wait for the reports from hospitals and agencies before distributing vaccines Distribution methods do not focus on patients most in need of vaccines Waiting for the reports can take 3-6 months During the wait, the pandemic may become out of control 10

Big Data Improves Healthcare New Approach to Distributing Vaccines Health agencies can now use social networks, such as Twitter, to detect pandemic trends in near real-time 1. Search tweets with certain keywords such as flu, vaccine and immunization to find potential patients 2. Look through these patients social networks to identify their infection patterns 3. Make maps of people tweeting to find out pandemic trends on a global or local scale. 11

Big Data Analytics: Industry Examples Health Care Reduce time needed to detect pandemics, provide vaccines where they are needed most Telecommunications Financial Medical Improve customer churn prediction with social media data Government Industry Verticals Internet Financial Services Phone/TV Retail Accelerate lending decisions using big data sources

Churn Analysis for Mobile Telco Definitions Churn is the term used to describe customer attrition or loss Churn rate is the number of participants who discontinue their use of a service divided by the average number of total participants during a period Reasons to churn Easy to switch provider Inadequate services Quality of service Plenty of attractive offers Customer dissatisfaction Difficult to manage the customer data Can we predict churn? If so, how? 13

Churn Analysis for Mobile Telco Synposis: A Mobile Telco company was losing customers and wanted to understand why Approach with Big Data Analyze call history data Treat call history as a social network Business challenge: Proactively detect mobile phone customers at risk of canceling contracts (customer churn) to retain customers and protect revenue Traditional Approach to Churn Analysis Look at spending patterns Review recurrent problems Cell phone history portrayed as a social network 14

Example of Cell Phone Cancellation Outbreak Month 1 15

Example of Cell Phone Cancellation Outbreak Month 2 16

Example of Cell Phone Cancellation Outbreak Month 3 17

Example of Cell Phone Cancellation Outbreak Month 4 18

Using Social Network Analysis to Improve Churn Prediction High risk cell phone churners can now be identified in 1 hour, saving $40 MM in first year If we had known two customers calling networks Could we have prevented five more from leaving? 19

Big Data Analytics: Industry Examples Health Care Reduce time needed to detect pandemics, provide vaccines where they are needed most Telecommunications Financial Medical Improve customer churn prediction with social media data Government Industry Verticals Internet Financial Services Phone/TV Retail Accelerate lending decisions using big data sources

Underwriting Risk Traditional Approach to Loan Processing Traditional Underwriting Risk Level Big Data Enabled Underwriting Risk Level TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED 21

Underwriting Risk Big Data Enabled Loan Processing Streamlined Process Traditional Underwriting Risk Level Big Data Enabled Underwriting Risk Level TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED 22

Underwriting Risk Big Data Enabled Loan Processing Shorter Time to Decision Traditional Underwriting Risk Level Big Data Enabled Underwriting Risk Level AVERAGE TIME TO CLOSE A HOME LOAN APPLICATION PRE-APPROVAL UNDERWRITING CLOSING TODAY BIG DATA ENABLED 2-3 Weeks 3-4 Weeks ~30% IMPROVEMENT TRADITIONAL DATA LEVERAGED BIG DATA LEVERAGED 23

Big Data

Big Data Key Characteristics Large Volumes New Sources Low Latencies Implications for the Enterprise New Platforms New Roles New Techniques 25

What s Driving the Data Deluge? Mobile Sensors Video Surveillance Social Media Oil Exploration OIL RIGS GENERATE 25000 DATA POINTS PER SECOND Smart Grids READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE Medical Imaging Video Rendering Gene Sequencing COST TO SEQUENCE ONE GENOME HAS FALLEN FROM $100M IN 2001 To $4K in 2013 26

What is Big Data? big data \ datasets so large they break traditional IT infrastructures. 27

Four Main Types of Data Structures Structured Data Quasi-Structured Data Semi-Structured Data View Source http://www.google.com/#hl=en&sugexp=kjrmc&cp=8&gs_id=2m&xhr=t&q=data+scientist& pq=big+data&pf=p&sclient=psyb&source=hp&pbx=1&oq=data+sci&aq=0&aqi=g4&aql=f&gs _sm=&gs_upl=&bav=on.2,or.r_gc.r_pw.,cf.osb&fp=d566e0fbd09c8604&biw=1382&bih=651 Unstructured Data The Red Wheelbarrow, by William Carlos Williams 28

Opportunities for a New Approach to Analytics Big Data Ecosystem 1 D a t a D e v i c e s Individual Law Enforcement 2 D a t a C o l l e c t o r s Analytic Services Government Medical Information Brokers Internet Advertising Marketers Websites 3 Employers D a t a A g g r e g a t o r s D a t a U s e r s / B u y e r s 4 Media Phone/TV Retail Catalog Co-Ops Media Archives Banks Credit Bureaus Financial Government List Brokers Delivery Service Private Investigators /Lawyers 29

Industries Are Broadly Embracing Data Science Retail CRM Customer Scoring Store Siting and Layout Fraud Detection / Prevention Supply Chain Optimization Advertising & Public Relations Demand Signaling Ad Targeting Sentiment Analysis Customer Acquisition Financial Services Algorithmic Trading Risk Analysis Fraud Detection Portfolio Analysis Media & Telecommunications Network Optimization Customer Scoring Churn Prevention Fraud Prevention Manufacturing Product Research Engineering Analytics Process & Quality Analysis Distribution Optimization Energy Smart Grid Exploration Government Market Governance Counter-Terrorism Econometrics Health Informatics Healthcare & Life Sciences Pharmaco-Genomics Bio-Informatics Pharmaceutical Research Clinical Outcomes Research 30

Emerging Need for Advanced Analytics

Business Drivers for Advanced Analytics Current Business Problems Provide Opportunities for Organizations to Become More Analytical & Data Driven 1 2 3 4 Driver Desire to optimize business operations Desire to identify business risk Predict new business opportunities Comply with laws or regulatory requirements Examples Sales, pricing, profitability, efficiency Customer churn, fraud, default Upsell, cross-sell, best new customer prospects Anti-Money Laundering, Fair Lending, Basel II 32

Big Data Requires New Approaches to Analytics Business Intelligence versus Data Science Predictive Analytics and Data Mining (Data Science) Exploratory Typical Techniques and Data Types Common Questions Optimization, predictive modeling, forecasting, statistical analysis Structured/unstructured data, many types of sources, very large data sets What if..? What s the optimal scenario for our business? What will happen next? What if these trends continue? Why is this happening? Analytical Approach Business Intelligence Data Science Business Intelligence Typical Techniques and Data Types Common Questions Standard and ad hoc reporting, dashboards, alerts, queries, details on demand Structured data, traditional sources, manageable data sets What happened last quarter? How many did we sell? Where is the problem? In which situations? Explanatory Past TIME Future 33

Tools, Technology, & Skill Development

Data Science is a Team Sport Key Roles for a Successful Analytic Project Business User Project Sponsor Project Manager Business Intelligence Analyst Database Administrator (DBA) Data Engineer Data Scientist 35

Data Analytics Lifecycle 1 Discovery 6 Operationalize 2 Data Prep 5 Communicate Results 3 Model Planning 4 Model Building 36

Leveraging Data Science Throughout the Organization Sales Identify associations between items frequently purchased together Marketing Clustering analysis to group similar customers together Finance Apply regression analysis to predict starting salaries Human Resources Use decision trees to predict employee turnover R & D Text Analysis of log files for service and security analysis Customer Support Classify support requests for intelligent routing Manufacturing Run simulations to optimize complex process flows 37

Data Sources for Analytic Projects Internal Data Sources External Data Sources Social Media Customer Demographics Mfg On-line Portal CRM System Marketing and Sales Customer Support Business HR R&D Finance ERP System Sales Lead Repository Economic Indicators

Tools and Technologies for Big Data Analytics Domain Free/Open Source Commercial Statistical Analysis and Data Mining NoSQL Natural Language Processing 39

Evolution of Big Data Analytics Embedding Analytical Intelligence Info Computing Distributed Computing Standalone analytics Simplifying Big Data 40

Growth of Data Scientist Opportunities Job Trends from Indeed.com A significant constraint on realizing value from big data will be a shortage of talent, particularly of people with deep expertise in statistics and machine learning, and the managers and analysts who know how to operate companies by using insights from big data." By 2018...the United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. Average "data scientist" salaries for job postings nationwide are 55% higher than average salaries for all job postings nationwide. Source: McKinsey Global Institute Big data: The next frontier for innovation, competition and productivity May 2011 Source: McKinsey Global Institute ; Big data: The next frontier for innovation, competition and productivity, May 2011 41

People & Skills Three Key Roles of the New Data Ecosystem Role Deep Analytical Talent Data Scientists Projected U.S. talent gap: 140,000 to 190,000 Data Savvy Professionals Projected U.S. talent gap: 1.5 million Technology & Data Enablers Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation, competition, and productivity 42

Profile of a Data Scientist Quantitative Technical Curious & Creative Skeptical Communicative & Collaborative 43

Skills Matrix, Based on Recent Students Quantitative Analysts, Statisticians, Business and data analysts Data Scientists Quantitative Skills Recent STEM Grads Business Intelligence Professionals, IT Technical Ability 44

Data Science and Big Data Analytics Course and EMCDSA Certification Course Overview Details Open curriculum Practitioner s approach Enables immediate participation on analytics projects Prepares for EMC Proven Professional Data Science Associate (EMCDSA) Certification 45

Two New EMC Data Science Courses for Business Transformation Business Leaders 90 min New Introducing Data Science and Big Data Analytics for Business Transformation Heads of Data Science Teams 1 day New Data Science and Big Data Analytics for Business Transformation Aspiring Data Scientists 5 days Data Science and Big Data Analytics

EMC Academic Alliance Provides students with competitive edge Partners with colleges and universities to prepare students for roles in data science and cloud computing 1,000+ Institutions in 60+ countries Provides unique open courseware at no cost Program resources Free faculty readiness training Instructor materials Online faculty and student communities Discount certification exam vouchers

Specific Data Science Skills & Traits 1 2 3 EDW 4 5 Apply data science methods in their current roles 48

Others Ways to Learn about Big Data Analytics Formal Training EMC Data Science & Big Data Analytics course STEM graduate programs and certificates Conferences on Analytics (Strata, PAW, ACM, ACL, INFORMS.) Free Massive Open Online Courses (MOOCs) 6 12 week online courses edx, Coursera, Udacity, Udemy, itunesu, Khan Academy Informal Training Look for opportunities to try out your skills, your day job provides this Offer to help on projects, opportunistically Every team is looking for people with these skills right now 49

Leverage The Wisdom of Crowds Social Media Volunteer to help Try Contests Kaggle, Innocentive 50

Key Takeaways Analyzing big data provides significant opportunity for deriving new value To do this, individuals will need to step up to the challenges and opportunities that Big Data and advanced analytics provide Look for opportunities to grow your skills and drive new value as a Data Scientist, Data-Savvy Librarian.. 51

Closing Thoughts. How will you use Big Data Analytics? Do you want to. Map the flow of ideas in research literature? Use citation networks to identify the most influential researchers? Predict award-winning research papers? Increase collaboration with researchers and faculty? Challenge traditional thinking using analytics? 52

Questions? Additional Resources: 1. My Blog on Data Science & Big Data Analytics: http://infocus.emc.com/author/david_dietrich/ David Dietrich @imdaviddietrich 2. Blog on applying Data Analytics Lifecycle to measuring innovation data: http://stevetodd.typepad.com/my_weblog/data-science-andbig-data-curriculum/ 3. EMC Education Services curriculum on Data Science & Big Data Analytics: http://education.emc.com/guest/campaign/data_science.aspx 53