Big Data and Science: Myths and Reality



Similar documents
Why the Big Deal about Big Data?

Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery

Big Data so what s the big deal? Jevin D. West ischool, University of Washington jevinw@uw.edu

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Conquering the Astronomical Data Flood through Machine

A Strategic Approach to Unlock the Opportunities from Big Data

Training for Big Data

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Data-Intensive Science and Scientific Data Infrastructure

Berkeley Institute for Data Science (BIDS)

Big Data Analytics: Driving Value Beyond the Hype

Data Refinery with Big Data Aspects

Big Data a threat or a chance?

Institutes for Data Science: New York University University of Washington University of California, Berkeley

curation, analyses and interpretation of massive datasets opportunities are varied across disciplines

Business Intelligence and Big Data Analytics: Speeding the Cycle from Insights to Action Four Steps to More Profitable Customer Engagement

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

The 3 questions to ask yourself about BIG DATA

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Big Data Effects on Weather and Climate

Data Centric Systems (DCS)

Luncheon Webinar Series May 13, 2013

Business IntelliSENSE Social and Internal Data Marriage for Smarter Decisions

Studying Code Development for High Performance Computing: The HPCS Program

How To Understand The Power Of Decision Science In Insurance

The Next Wave of Data Management. Is Big Data The New Normal?

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

ICT Perspectives on Big Data: Well Sorted Materials

STAND THE. Data Center Optimization. Q&A with an Industry Leader

Reaping the Rewards of Big Data

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Collaborations between Official Statistics and Academia in the Era of Big Data

Big Data Integration: A Buyer's Guide

a Data Science Univ. Piraeus [GR]

Outline. What is Big data and where they come from? How we deal with Big data?

Teaching Computational Thinking using Cloud Computing: By A/P Tan Tin Wee

Exploiting the power of Big Data

An analysis of Big Data ecosystem from an HCI perspective.

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

9360/15 FMA/AFG/cb 1 DG G 3 C


Agenda Overview for Customer Experience, 2015

Big Data Challenges in Bioinformatics

Big Data Analytics in Space Exploration and Entrepreneurship

Scientific Computing Meets Big Data Technology: An Astronomy Use Case

Tools for Managing and Measuring the Value of Big Data Projects

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

IS YOUR WEBSITE LEAKING LEADS?

Organizational Implications of Data Science Environments in Education, Research, and Research Management in Libraries

The 4 Pillars of Technosoft s Big Data Practice

Big Data: Are You Ready? Kevin Lancaster

Big Data: Moving Beyond the Buzzword

Data Aggregation and Cloud Computing

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Big Data and Analytics: Challenges and Opportunities

CHAPTER - 5 CONCLUSIONS / IMP. FINDINGS

IOT & Applications of Big Data

White Paper. Version 1.2 May 2015 RAID Incorporated

Jeff Wolf Deputy Director HPC Innovation Center

8970/15 FMA/AFG/cb 1 DG G 3 C

DATAOPT SOLUTIONS. What Is Big Data?

Impact of Architecture on Continuous Delivery

Cloud Computing Trends

How to leverage SAP HANA for fast ROI and business advantage 5 STEPS. to success. with SAP HANA. Unleashing the value of HANA

How To Teach Data Science

Better Decision Making

BIG DATA: CHALLENGES AND OPPORTUNITIES IN LOGISTICS SYSTEMS

DGE /DG Connect

COMP9321 Web Application Engineering

CS144R/244R Network Design Project on Software Defined Networking for Computing

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES

WHITE PAPER. 7 Signs You Need Advanced Analytics for Salesforce.com (or any CRM)

Optimizing Global Engineering Efficiency With a Holistic Project Approach

A Case for Static Analyzers in the Cloud (Position paper)

Logical Semantic Warehouse - Developing Your Own Semantic Ecosystem Peter Lawrence, TopQuadrant

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Bringing Together ESB and Big Data

ebook 10 TIPS FOR USING FOR LEAD GENERATION

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

BMW11: Dealing with the Massive Data Generated by Many-Core Systems. Dr Don Grice IBM Corporation

Big Data Analytics. Chances and Challenges. Volker Markl

Big data big talk or big results?

Information Visualization WS 2013/14 11 Visual Analytics

The Social Media Pocket Guide

The Liaison ALLOY Platform

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Ethnography and Big Data

Big Data Processing and Analytics for Mouse Embryo Images

Big Data in Subsea Solutions

Analytics For Everyone - Even You

Silver Sponsor of the study. Digital Testing in Europe: Strategies, Challenges & Measuring Success

Eindhoven December 4, 2014

10/24/2015. Review the extant Marketing Literature to provide initial answers to the MSI research priorities. Review Big Marketing Data Analytics

The InterNational Committee for Information Technology Standards INCITS Big Data

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

Personalized Business Intelligence

Transcription:

Big Data and Science: Myths and Reality H.V. Jagadish http://www.eecs.umich.edu/~jag

Six Myths about Big Data It s all hype It s all about size It s all analysis magic Reuse is easy It s the same as Data Science It s all in the cloud

Big Data Myth 1 Big Data is all hype.

Data Analysis Has Been Around for a While R.A. Fisher Peter Luhn W.E. Demming 1970: Relational Database E.F. Codd Howard Dresner Abridged Version of Jeff Hammerbacher s timeline for CS 194 at UCB, 2012

Breathless Journalists!!

Big Data Impetus Can collect cheaply, due to digitization. Can store cheaply, due to falling media prices. Driven by business process automation and the web. But now impacting everywhere.

Nearly every field of endeavor is transitioning from data poor to data rich Astronomy: LSST Physics: LHC Oceanography: OOI Neuroscience: EEG, fmri Sociology: The Web Biology: Sequencing Economics: mobile, POS terminals Data-Driven Medicine Sports Slide courtesy of the Moore Sloan Data Science Environments Initiative 7

Data-Driven Science 1. Empirical + experimental 2. Theoretical 3. Computational 4. Data-Intensive Jim Gray Slide courtesy of the Moore Sloan Data Science Environments Initiative 8

Big Data Hype? The Gartner Hype Cycle Just because it s hyped doesn t mean we can or should ignore it Slide courtesy of Michael Franklin

Big Data Fact 1 Big Data is all hype. It may be hyped, but there is more than enough substance there for it to deserve our attention.

Big Data Myth 2 Size is all that matters. Challenges are only at the extremes (in size).

What is Big Data Gartner Definition: Volume Velocity Variety Veracity V..

Variety How do you even measure variety? No measure => hard to track progress Infinite variety on the web You keep finding sites you have never seen before Infinite variety in human generated content

Veracity Who do you trust? Reputation on the web. Independence determination When is it a new source and when is it a copy?

Big Data Fact 2 Size is all that matters. Yes, Volume and Velocity are challenging But Variety and Veracity are far more challenging

Big Data Myth 3 Big Data Analysis Magic Deep Insights

Companies Propagate This!! From the web site of a representative silicon valley company

The Big Data Pipeline

Big Data Challenges In each of the steps Read the whitepaper: http://cra.org/ccc/docs/init/bigdatawhitepaper.pdf Shorter version in CACM, July 2014.

Big Data Fact 3 Every aspect of the data ecosystem poses challenges that must be addressed.

Big Data Myth 4 Data reuse is low hanging fruit Lots of data collected for some purpose Can (later) be used for a different purpose

Unemployment Rate Prediction based on Tweets Cafarella, Levenstein, Shapiro http://econprediction.eecs.umich.edu/

Data is Organized Wrong E.g. administrative data is often rolled up by administrative jurisdiction. Consider Butler County, Ohio.

Data is Organized Wrong E.g. administrative data is often rolled up by administrative jurisdiction. How to compare data rolled by school district with data rolled up by zip code? Working with Gates Foundation Create *estimated* data rolled up by desired jurisdiction.

Research Data Reuse Much data is now available Strong push from federal agencies Parallel push from reproducibility advocates But obstacles remain Incentives to record metadata. Very hard for third party to use otherwise Data citation methodology and convention

Big Data Fact 4 Data reuse is low hanging fruit Data reuse is critical to address Holds out great promise But also poses many challenging questions

Big Data Myth 5 Data Science is the same as Big Data

Data Science The use of data to address problems in a domain of interest. Requires data management, data analysis, and domain knowledge. Often involves Big Data But may not

Computer & Information Sciences Statistical & Mathematical Sciences Data Science Domain Sciences

Data Science Status Importance widely recognized in academia. Partly driven by employer demand Multi-disciplinary nature recognized. Common solution is to have some sort of structure that overlays and crosses traditional departments E.g. http://minds.umich.edu

Big Data Fact 5 Data Science is the same as Big Data Data Science is related to, but different from, Big Data

Big Data Myth 6 The central challenge with Big Data is that of devising new computing architectures and algorithms.

Big Data Myth 6 (reprise) Big Data is all in the cloud Big Data = Map Reduce style computation

What is Big Data Volume Velocity Variety Veracity More than you know how to handle.

Humans and Big Data We can buy bigger systems, more machines, faster CPU, larger disks. But human ability does not scale! Big Data poses huge challenges for human interaction.

Usability for Data Science Data Science tasks usually involve data analysis by a domain expert with limited database expertise. If domain expert is to succeed, data must be usable. Usability matters most when the data are big.

Database Usability Improve user s ability to complete a task with a (big) database through better: Query formulation Result presentation HCI principles are very useful But, usability is not interface design. See http://www.eecs.umich.edu/db/usable

Big Data Fact 6 Big Data is all about the cloud. The cloud has its place in the constellation of relevant technologies, but is not a required piece of every solution. In fact, there are many other challenges that are at least as important cf. National Academies report on Frontiers of Massive Data Analysis

Acknowledgments NSF Grants 1017296 and 1250880

Big Data and Data Science Lots of Buzz With good reason Great potential Many challenges