Big Data how it changes the way you treat data

Similar documents
Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Collaborations between Official Statistics and Academia in the Era of Big Data

Traffic Prediction and Analysis using a Big Data and Visualisation Approach

UN Global Pulse: Harnessing Big Data for a Revolution in Sustainable Development and Humanitarian Action Robert Kirkpatrick

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

Information Management course

Inside the Obama Analytics Cave Andrew Claster, Deputy Chief Analytics Officer Obama for America W INNING K N OWLEDGE T M

Analyze It use cases in telecom & healthcare

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.

SURVEY REPORT DATA SCIENCE SOCIETY 2014

BIG DATA: PROMISE, POWER AND PITFALLS NISHANT MEHTA

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Getting personal: The future of communications

Introduction to Data Mining

The University of Jordan

Big Data. How it is Transforming Learning and Talent Development

BIG DATA FUNDAMENTALS

Introduction to Data Mining

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Survey Results: Requirements and Use Cases for Linguistic Linked Data

LARGE-SCALE DATA-DRIVEN DECISION- MAKING: THE NEXT REVOLUTION FOR TRADITIONAL INDUSTRIES

The New World of Data. Don Strickland President, Strickland & Associates

Media Planning. Marketing Communications 2002

Understanding data visualisation to create insight

The Need for Training in Big Data: Experiences and Case Studies

FutureWorks Nokia technology vision 2020: personalize the network experience. Executive Summary. Nokia Networks

SOCIAL MEDIA: A NEW DATA SOURCE FOR PUBLIC HEALTH. Mark Dredze Johns Hopkins University Michael Paul, Alex Lamb, David Broniatowski

BIG DATA FOR DEVELOPMENT: A PRIMER

Big Data-Challenges and Opportunities

Concept and Project Objectives

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Statistics for BIG data

Debugging the Hype about Big Data and Business Service Metrics

Big Analytics: A Next Generation Roadmap

Healthcare data analytics. Da-Wei Wang Institute of Information Science

Big Data. Fast Forward. Putting data to productive use

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Transforming the Telecoms Business using Big Data and Analytics

Network Big Data: Facing and Tackling the Complexities Xiaolong Jin

304 Predictive Informatics: What Is Its Place in Healthcare?

Big Data Executive Survey

Data Mining. Concepts, Models, Methods, and Algorithms. 2nd Edition

SMARTPHONES & BIG DATA. Daniel Nelson Head of Enterprise Development, daniel.nelson@braintreepayments.

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Big Data Mining: Challenges and Opportunities to Forecast Future Scenario

Facebook Ads: Local Advertisers. A Guide for. Marketing Research and Intelligence Series. From the Search Engine People. Search Engine People

INSIGHTS WHITEPAPER What Motivates People to Apply for an MBA? netnatives.com twitter.com/netnatives

New Clinical Research & Care Opportunities Through Big Data Informatics

The Ello social media network: Identifying the Joiners, Aspirers, and Detractors. November 2014 Insight Report using our DeepProfile capabilities

Big Data and Open Data

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

ICT Perspectives on Big Data: Well Sorted Materials

BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?

Big Data and High Quality Sentiment Analysis for Stock Trading and Business Intelligence. Dr. Sulkhan Metreveli Leo Keller

The Math. P (x) = 5! = = 120.

NTT DATA Big Data Reference Architecture Ver. 1.0

Sentiment analysis on tweets in a financial domain

Artificial Neural Network Approach for Classification of Heart Disease Dataset

Professional Diploma in Digital Marketing

Statistical Challenges with Big Data in Management Science

The new driving force of data-driven marketing

Data Mining Algorithms Part 1. Dejan Sarka

Too Big to Ignore. The Business Case for Big Data. Wiley and SAS Business Series

Cleveland State University

Is big data the new oil fuelling development?

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data in Healthcare: Myth, Hype, and Hope

Building Common Practice of Social Media Campaign for Public Awareness of Deposit Insurance Systems

AHLA. E. Big Data, Big Promise and (Potentially) Big Problems. Kent Bottles Principal, Healthcare Consulting PYA Knoxville, TN

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Concept and Applications of Data Mining. Week 1

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

Why Semantic Analysis is Better than Sentiment Analysis. A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights

Transcription:

Big Data how it changes the way you treat data Oct. 2013 Chung-Min Chen Chief Scientist Info. Analysis Research & Services The views and opinions expressed in this presentation are those of the author and do not necessarily reflect the position of the company. 1 2012 Applied Communication 2012 Applied Communication Sciences. Sciences. All Rights All Rights Reserved. Reserved.

About ACS Company history Bellcore (Applied Research), 1985-1999 Telcordia (Advanced Technology Solutions), 1999-2012 Ericsson 2012-1013 Big data R&D Stream Tribeca: A Stream Database Manager for Network Traffic Analysis. VLDB96 Latent semantic indexing Telecom: CDR/Subscriber reconciliation, Service Assurance 2

Hope or Hype? 3

Hope or Hype? Big data will change* The way you live The way you work The way you think N Big data is Big Bubble? remember.com, Web 2.0? The hype cycle t * Big Data: A Revolution That Will Transform How We Live, Work, and Think, Mayer-Schonberger, K. Cukier. 4

big data on Google Trends 5 5

Has big data reached its hype peak? source:kdnuggets.com * bar height in proportion to number of votes 6

4 V s of Big Data Big data is data whose scale, diversity, and/or timeliness requires new architectures and analytics to unlock business value. EMC 2 --- EMC 2 datasciencentral.com 7

Big Data Definition Revisited Data that is expensive to manage, and hard to extract value from UCB AMP Lab Too big, expensive and too hard to handle! --MIT source: ORACLE 8

Big data is not about data size, it s about the new thinkings of how to treat data. 9

Big Data Technologies OLAP Mining Learning Visualization NoSQL Parallel Programming Distributed FS Analytics Platform Value Variety Veracity Volume Velocity 10

Quantity change leads to quality change Passiveness leads to fidelity Past: volunteers + questionnaire Observer Effect Now: big data + analysis Scrutiny leads to discovery Sampling shortfalls: random is hard, lacks details, missing targets 11

accuracy Machine Translation Linguistic Model dictionary, grammar rule-based Statistical Model Digest bilingual text corpus Pattern match-based How to improve accuracy Improve existing algorithms Develop new algorithms Increase training size (text corpus) training size 12

Machine Translation Linguistic Model dictionary, grammar rule-based Statistical Model Digest bilingual text corpus Pattern match-based How to improve accuracy Improve existing algorithms Develop new algorithms Increase training size (text corpus) 松 下 問 童 子 Panasonic asked the boy Panasonic asked the lad 小 心 墜 河 Carefully fall into the river Carefully zhuihe 13

Elections Obama big data team Targeted fund raising Social network based 拉 票 催 票 Targeted TV advertisement Big data-based prediction Nate Silver vs. Washington elite Big data vs. phone polls c - Inside the Secret World of Quants and Data Crunchers Who Helped Obama Win, TIME Magazine, Nov. 7, 2012. - How Vertica Was the Star of the Obama Campaign, and Other Revelations, www.citoresearch.com, Jan. 16, 2013. 14

Linguistics Research 500M Tweets per day Study of language evolution Example findings Old :-), young :) Stanford Univ. Young: expressive lengthening Coooool Univ. of Twente Women like to use I,!!! Predict gender 75% Mitre Challenges Biased towards young, urban Nonstandard speech, Ima call #mybf now ``The Linguist s Mother Lode. What Twitter reveals about slang, gender and no-nose emoticons, TIME, Sep. 9, 2013. 15

2. Correlation prevails Causality 知 其 然 而 不 知 其 所 以 然 Knowing correlation is good enough Predicting without explanation Causality is hard, sometimes impossible, to verify High-voltage station/towers cause cancer? Base stations cause cancer? Frequent mobile phone usage causes cancer? 16

Doctors vs. Computers who do you trust? ER Crisis at Cook County Hospital, 1996 Flooded with chest pain patients Who should be admitted (i.e. having real heart attack)? Standard manual procedure BP, stethoscope, questions, ECG 90% admitted are false positive; 83% recall admitted having heart attack Blink: the power of thinking without thinking, M. Gladwell. Goldman L, Cook EF, Brand DA et al. A computer protocol to predict myocardial infarction in emergency department patients with chest pain. N Engl J Med 1988; 318 (13):797-803 17

Doctors vs. Computers who do you trust? 3-level decision tree (a) Unstable angina pain? (b) Fluid in lung? (c) Systolic BP < 100? Results False positives < %30 (vs. >90% by doctors) Recall > 95% (vs. 83% by doctors) Yes b a No b c c c c admitted having heart attack 18

Less is More: feature extraction Other features seem to be insignificant Age Job: pressure, hours Exercise High BP history Weight Heart disease Sweating 19

2. 知 其 然 而 不 知 其 所 以 然 (cont.) Correlation prevails Causality Knowing correlation is good enough well, not all the time Mechanical causality Bayesian network Data provenance Explain what I found 20

Data Provenance Courtesy of Prof. Renee Miller, Univ. of Toronto 21

2. 知 其 然 而 不 知 其 所 以 然 (cont.) Correlation prevails Causality Knowing correlation is good enough well, not all the time Be careful not to ignore causality for all Crowded parking lots higher sales Orange cars less defect 22

Issues Privacy Notice and consent (Target) Opt out (Google) Anonymization (Netflix) Societal impact Act before it happens Big data divide 23

Recap and Trends 24