Statistics, Big Data and Data Science!?

Similar documents
A Pharmacometrician s Perspective for Utilization of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data

Statistics for BIG data

Opportunities and Limitations of Big Data

Big-data Analytics: Challenges and Opportunities

!!! The Fallacy of Big Data! Brian Fine and Con Menictas!

Challenges, Tools and Examples for Big Data Inference

Cutting Through The Hype: What You Need To Know About Big Data

Big Data Hope or Hype?

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

BIG DATA POSSIBILITIES AND CHALLENGES

Why Big Data is not Big Hype in Economics and Finance?

ANALYTICS IN BIG DATA ERA

Big Data Big Knowledge?

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data Analytics. Lucas Rego Drumond

Finding Patterns the Challenge of Big Data 1

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Why Modern B2B Marketers Need Predictive Marketing

Introduction to Big Data the four V's

Fundamental Statistical Analysis for a Future with Big Data

Keyword Research: Exactly what cash buyers, motivated sellers, and private lenders are looking for online. (and how to get in front of them)

Sunnie Chung. Cleveland State University

Cloud Computing and Big Data What s the Big Deal

How To Understand Data Science

Big data: are we making a big mistake?

Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics

Teaching Business Statistics through Problem Solving

Web Data Mining: A Case Study. Abstract. Introduction

Big Data Effects on Weather and Climate

Focus should be on the value of big data, not technical points. Leveraging big data will require changing some long-held paradigms

EASI Reseller Opportunities: Demographic Estimates and Forecasts; Life Stage Clusters; Major Merchandise Lines and Minor Store Groups

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Big Data Challenges in Bioinformatics

Big Data Integration: A Buyer's Guide

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

David J. Hand Imperial College, London

QUANTIFYING THE EFFECTS OF ONLINE BULLISHNESS ON INTERNATIONAL FINANCIAL MARKETS

INTRODUCTION. DONALD DUVAL Director of Research AON Consulting, United Kingdom

Is Big Data Bigger than a Bread Box?

SOCIAL MEDIA ADVERTISING STRATEGIES THAT WORK

How To Improve Data Quality

The Library (Big) Data scien4st


A How-to Guide By: Riaan Van Der Merwe, General Manager, Dynamics, Neudesic

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

The Internet of Things and the future of manufacturing

The consumer purchase journey and the marketing mix model

Statistics Sites in Germany

Research Note What is Big Data?

Big Data and Data Science: Behind the Buzz Words

Cloud Computing and Big Data. What s the Big Deal?

The internet economy: Creating billion dollar companies

Cloud Computing and Big Data

The value of data analytics

General overview, and sources and uses of Big Data for urban and regional analysis

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

Introducing Big Data. Abstract. with Small Changes. Agenda. Big Data in the News. Bits and Bytes

ENHANCING CUSTOMER EXPERIENCE

5 WAYS TO DOUBLE YOUR WEB SITE S SALES IN THE NEXT 12 MONTHS

Big data, big business, Big Brother?

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Louis Gudema: Founder and President of Revenue + Associates

Small Business Owners: How You Can-and Must-Protect Your Business From The IRS If You Have Payroll Tax Problems!

A Near Secret SEO Strategy Turbo-Charged Using SEO Zen

Monte Carlo testing with Big Data

Machine Learning and Econometrics. Hal Varian Jan 2014

How is Big Data Different? A Paradigm Shift

Social Media and Sales Quota

Statistics Meets Big Data 統 計 遇 見 大 數 據

1. Understanding Big Data

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Lessons learned. From Silos to Social- Blended Multichannel Customer Experiences. A Seven Step Roadmap for Success. Created Exclusively for

Hadoop for Enterprises:

Data mining and official statistics

This blog entry is targeted at both users (customers) and implementers (Business Partners/Technology companies) of Big Data.

0:00:18 Todd: 0:00:30 Kim: 0:01:15 Todd: 0:02:10 Kim: 0:04:10 Todd:

Certification In SAS Programming. Introduction to SAS Program

Cloud Strategy - Transforming Your Approach To Business

Data Virtualization: Achieve Better Business Outcomes, Faster

Why Computer Science? Robert H. Sloan University of Illinois at Chicago

How Consumerization is Changing the World of Analytics

One Statistician s Perspectives on Statistics and "Big Data" Analytics

The Value of Big Data: A Linked-Data Perspective for Corporate Performance

#1 Subject: The Most Effective Online Marketing Tool in the World. It s not Facebook, Twitter, Pinterest, or face- to- face networking.

CV of Dr. Joachim Schnurbus

GLOBAL SUPPLY CHAINS HOW TO OPTIMIZE THROUGH KEY PARTNERSHIPS. Jeffrey D. Tew, Ph.D. Chief Scientist Cincinnati Innovation Center, TCS

Revenue Management. Who I am...

Simple Linear Regression

Of law firms and legal businesses. To the victor the spoils

+ BIG DATA. Big Data Connectivity Buyer s Guide

The? Data: Introduction and Future

Transcript of Socket Mobile, Inc. Second Quarter 2015 Management Conference Call July 29, 2015

How To Win Online with Rapid Customers..

Business Analytics: A Knowledge Community and Repository Infrastructure for R Models. Master Teamproject Prof. Dr. Alexander Mädche, Martin Kretzer

ANALYTICS IN BIG DATA ERA

Big Data og Smart City. Knut H. H. Johansen CEO esmart System 7. mai 2015

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

Transcription:

Statistics, Big Data and Data Science!? Prof. Dr. Göran Kauermann Ludwig-Maximilians-Universität Munich, Germany Statistics, Big Data and Data Science Statistics Founded around 1900 with the seminal work of Pearson and later Fisher Big Data The Big Topic with the three (four) V s Data Science Proposed by Cleveland (2001, 2005): Learning from Data: Unifying Statistics and Computer Science 2 1

Statistics, Big Data and Data Science Sta$s$cs Founded around 1900 with the seminal work of Pearson and later Fisher Big Data The Big Topic with the three (four) V s Data Science Proposed by Cleveland (2001, 2005): Learning from Data: Unifying Sta5s5cs and Computer Science 3 Statistics Statistics is (the) science that pertains to the collection, analysis, interpretation and presentation of data. (Wikipedia) 4 2

Statistics the first 100 years Sta$s$cal Founda$ons Sta$s$cal Modelling Sta$s$cs and Big Data? Likelihood-Inferenzce Sta5s5cal Tests ANOVA Linear Regression EDA etc. Generalised Regression Computa5onal Sta5s5cs, MCMC R-Project, Smooth Regression Data Mining Inference in Big Data Computa5onal Sta5s5cs Data Science 1900 1950 2000 2015 Is statistics ready for the next century? 5 5 Statistics in Germany 6 3

Statistics in Germany Statistics has been prosperous in Germany in the last 10 years TU Dortmund and LMU Munich (BA and MA) HU/FU/TU Berlin, Bielefeld, Göttingen (MA) Ulm, Bremen, Heidelberg, Bamberg, Trier, Mainz, Magdeburg (special programs) Mathematics departments and economics departments Are the German statisticians ready for the next century? 7 Statistics, Big Data and Data Science Statistics Founded around 1900 with the seminal work of Pearson and later Fisher Big Data The Big Topic with the three (four) V s Data Science Proposed by Cleveland (2001, 2005): Learning from Data: Unifying Statistics and Computer Science 8 4

Big Data Everybody talks about it! 9 Big Data Everybody talks about it! Big Data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it (Dan Ariely, 2013) 10 5

Big Data The Buzzword Financial Times Magazine (March 2014): Big Data is a vague term for a massive phenomenon that has rapidly become an obsession with entrepreneurs, scientists, governments and the media. As with so many buzzwords, big data is a vague term, often thrown around by people with something to sell. Is Big Data the new gold rush? 11 Big Data the four V s Big Data are classified with the four V s Volume Big Data are large in size Variety Big Data are complex Velocity Big Data arrive in high speed at high resolution Veracity Big Data may not be reliable (bias issues) 12 6

Big Data From Data to Knowledge Wired Magazine (June 2008): The End of Theory: The data deluge makes the scientific method obsolete. The End of Theory: With enough data, the numbers speak for themselves. Big Data, is this the end of statistics? 13 Gartner s Hype Cycle Source: Gartner Blog network 14 7

Big Data The two Extremes Opinions The two view about Big Data: With enough data we don t need theory and we can explain the world. Big Data is just a hype and will die out sooner or later. Big Data, a challenge or the end of statistics? 15 Big Data End of Statistics? Let s answer the question with Big https://www.google.com/trends/ Google Trends protocols which keywords are searched in Google, when, where, etc. 16 8

17 18 9

19 20 10

21 Big Data End of Theory? Is Big Data the death of Statistics? Statisticians have spent the past 200 years figuring out what traps lie in wait when we try to understand the world through data. The data are bigger, faster and cheaper these days, but we must not pretend that the traps have all been made safe. (Financial Times Magazin, Tim Harford, 28.3.2014) 22 11

Data Scientists Why are they needed by the industry? 23 Big Data Example 1 Source: Lazer et al, 2014, Science, Vol. 343. 24 12

Big Data Example 1 Google s Flu Trend The trend worked nicely, but then it failed, since: Correlation is not equal to causation What causes what needs a model and data 25 Big Data Example 2 Price Elasticity Estimation Research Project with large German airline Problem: Estimation of Price Elasticity Huge (!!) data base containing Price and Ticket sales Regression model: Ticket Sales = s(price) + error 26 13

Big Data Example 2 Price Elasticity Estimation Problem: The price is NOT exogeneous!! Demand depends on price and price depends on demand The data-based price elasticity is overestimated The problem is well know in econometrics 27 Big Data Example 3 Big Computing versus Sampling Big Data often demand for Big Computing Information, however, can be also be retrieved from a sample Example: Network Data (e.g. Facebook) Statisticians know how to sample 28 14

Big Data Example 3 The Sonntagsfrage asks roughly 1.000 people about their political views sample 1.000 out of about 60 million margin of error (standard deviation) Why is it better to ask just 1.000 people and not 60 million, if possible? sampling error diminishes, but sampling bias occurs 29 We conclude: Big Data and Statistics Big Data does not make theory (thinking) obsolete. Big Data analy5cs needs sta5s5cal thinking and reasoning But:. 30 15

We conclude: Big Data and Statistics Big Data does not make theory (thinking) obsolete. Big Data analy5cs needs sta5s5cal thinking and reasoning But: Sta5s5cs also needs to tackle Big Data issues 31 Big Data and Statistics David Spiegelhalter: Complete bollocks. Absolute nonsense. There are a lot of small data problems that occur in big data. They don t disappear because you ve got lots of the stuff. They get worse. 32 16

Big Data and Statistics Other statements: David Hand: We have a new resource here. But nobody wants data. What they want are the answers. Patrick Wolfe: It s the wild west right now. People who are clever and driven will twist and turn and use every tool to get sense out of these data sets, and that s cool. But we re flying a little bit blind at the moment. 33 Big Data - A further (single) Statistician s View Statistics needs more involvement in the Big Data wave Statistical ideas and models are useful and need to be scaled up The old statistics is not dying out (p-values and small samples remain useful) A new paradigm: Approximate data analysis may be better than optimal fitting procedures (Göran Kauermann, 2016) 34 17

Statistics, Big Data and Data Science Statistics Founded around 1900 with the seminal work of Pearson and later Fisher Big Data The Big Topic with the three (four) V s Data Science Proposed by Cleveland (2001, 2005): Learning from Data: Unifying Statistics and Computer Science 35 Statistics versus Data Scientists What is Data Science? Cleveland (2001): Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics Data Science = Statistics of tomorrow? or Data Science = Statistics carried out by non-statisticians? 36 18

Statistics and Data Scientists 1900 Statistics 1950 Computer Science Data Science 2000 37 Statistics and Data Scientists 1900 Statistics 1950 Computer Science 38 19

Statistics and Data Scientists 1900 Statistics Data Science 1950 Computer Science 39 Quotes from Cleveland Computer scien4sts, waking up to the value of the informa4on stored, processed and transmi<ed by today s compu4ng environments, have a<empted to fill the void. One current of work is data mining. But the benefit to the data analyst has been limited, because the knowledge among computer scien4sts about how to think of and approach the analysis of data is limited, just as the knowledge of compu4ng environments by sta4s4cians is limited. A merger of the knowledge bases would produce a powerful force for innova4on. 40 20

Computer scien4sts, waking up to the value of the informa4on stored, processed and transmi<ed by today s compu4ng environments, have a<empted to fill the void. One current of work is data mining. But the benefit to the data analyst has been limited, because the knowledge among computer scien4sts about how to think of and approach the analysis of data is limited, just as the knowledge of compu4ng environments by sta4s4cians is limited. A merger of the knowledge bases would produce a powerful force for innova4on. Quotes from Cleveland 41 Data Science What is it about? Data Science combines informatics and statistics in order to extract information from real data. Data Science is a blend of Red-Bull-fuelled hacking and espresso-inspired sta4s4cs (Mike Driscoll, CEO Metamarket) 42 21

Data Scientists What do they do? Source: C. O Neil, R. Schuf (2014), Doing Data Science, O Reilly Media Inc., USA. 43 Data Scientists What do they do? Retrieve information from data Apply machine learning tools Deal with data confidentiality Source: C. O Neil, R. Schutt (2014), Doing Data Science, O Reilly Media Inc., USA. Communicate the results Use statistical models 44 22

Statistics and Computer Science The stereotypes: Computer Scientists predict and forecast Statisticians model and interpret But both tackle the question: How can we make the data speak? 45 Data Science The definition of Data Science is not consolidated We consider Data Science as 50% Statistics and 50% Informatics (Computer Science) Master in Data Science at LMU (Elite-Network Bavaria) 46 23

Program starts Oct 2016 International Program Data Science @ LMU 50% Statistics and 50% Informatics www.datascience-munich.de 47 Challenges in Data Science Collaboration Big Data occur outside of statistics/informatics Training More master programs in Data Science Consolidation Data Science is Data Science 48 24

Statistics, Big Data and Data Science Statistics and Computer Science merged into Data Science Big Data are the driving force Classical Statistics remains important New challenges in Statistics/Informatics 49 Challenges in Statistics Do we need optimal solutions? Approximate inference, Smart and real time computing Parrallel Computing Do we need asymptotic statistics? We have large n, so why bother about mathematical asymptotics What does n è really mean? 50 25

Challenges in Statistics Do we need significance tests? Model Selection is important Significance versus relevance Do we need statistical models at all? Stochastic character remains in big data Simple stochastic models are too simple 51 Challenges in Statistics Do we need correlation? Dependence structure is relevant Copula or more complex models Do we need linear models at all? Linear models and linear procedures are fast Linear approximations are often sufficient 52 26

The Statistical Approach After all: The statistical paradigm remains Questions Data Ú Model Answers Estimates 53 Many thanks 54 27