Challenges, Tools and Examples for Big Data Inference

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Challenges, Tools and Examples for Big Data Inference"

Transcription

1 Challenges, Tools and Examples for Big Data Inference Jean-François Plante, HEC Montréal Closing Conference: Statistical and Computational Analytics for Big Data June 12 th, 2015

2 What is Big Data? Dan Ariely from Duke Univeristy : 1

3 What is Big Data? 2

4 Overview of the Opening Conference and Bootcamp Held at Fields January 12 to January scientific talks. Covering all themes of the Big Data program, one theme per day. An overview paper is being prepared by the postdoctoral fellows and longer term visitors at the Fields institute. 3

5 Themes of the Program Week one: Introductory Lectures and Overview Inference Environmental Science Optimization Week two: Visualization Social Policy Health Policy Deep Learning Networks and Machine Learning 4

6 Why Do We Talk About Big Data? Because we can! (techonology makes it possible). Because Big Data allows to observe and measure behaviours or events about humans. Because we can measure new things that are otherwise hard or impossible to evaluate. Because imperfect, large, unstructured or hard to handle data may still contain valuable information that we should not dismiss. 5

7 Example #1: Measuring the Effect of Nutrition David Buckeridge, McGill University, with INSPQ Diet is known as an important factor in the study of disabilities, but very little is known about people s nutritional behaviour. Nielsen: Information about all products sold by groceries and corner stores (from about 10% of all outlets) at the 3-digit postal code level. Match with UPC for nutrition. Loyalty programs: Purchases at the household level. Can be combined to medical records of disabilities (eg. diabetes). 6

8 Example #2: Predicting Insurgencies Shane Reese, Bringham Young University Insurgencies and riots are frequent in South America: 100s or 1000s in each country every year. 4 years of Twitter messages from South America. The massive database is stored on a Hadoop file system. Gold standard for insurgencies: GSR. Occurrence of an insurgency predicted by the volume of tweets, the presence of some keywords, and an increase in the use of The Onion Router (TOR), an online service to anonymize tweets. 7

9 Challenges from Volume Methods fail on available computers they do not scale well Exploratory Data Analysis is still crucial, but it is harder and more complex to perform Special infrastructure may be needed (eg. cluster for distributed data) using languages we are not typically trained for. Asymptotics fail: The relative link between n and p is different (eg. n/p k < as n ). 8

10 Challenges from Variety New types of data are available and must be included in the analysis: o Text o Images o Sound o Video o Networks Data may be heterogeneous : o Patrick Brown, UofT: spatial data with postal codes and census areas: do not match and vary through time. o Bo Li, U. of Illinois: Reconstructing temperature data from many proxies that vary through time (tree rings, pollen, ice cores, etc.) 9

11 Challenges Related to Veracity Data were collected for a purpose other than the one we want to use them for. They are observational, thus typically not from the population of interest Bias Data quality is hard to maintain in large administrative databases. o Lisa Lix, U. Manitoba: Models to improve the quality. Bias may be induced by model selection o Richard Lockhart, SFU: Inference from the LASSO. o Ejaz Ahmed, Brock U.: Bias from small signals forced to 0. 10

12 Challenges from Velocity Velocity is often a challenge when real-time decision or predictions must be made. Inference appears to be often done on fixed data and velocity is not the main issue. As a notable exception: models that are designed to make online predictions have to be able to produce those predictions fast. 11

13 Solution #1: Building More Complex Models With more data available, there is the possibility of fitting a much more complex model. Deep learning is a very successful example of the power of more complex models (eg. talk of Ruslan Salakhutdinov, UofT). Many layers of latent variables. Generates features automatically. Demo: o Finding similar images. o Generating captions for images. 12

14 Solution #2: Assuming Sparsity High dimensional data may have a lower dimensional underlying structure. Sometimes, the dimension of a model may even exceed the sample size! Assuming sparsity (ie. that most coefficients are 0) is a possible solution. The LASSO assumes that only some variables contribute to the signal. A penalty controls the number of null parameters (indirectly by controlling their magnitude). Regularization (penalty to control the coefficients) is used for other models as well, including deep learning models. Random projections maps a high dimensional space to a smaller space where distances are (almost) preserved. 13

15 Solution #3: Non-Convex Optimization Regularization with convex functions is easy to optimize, but non-convex penalties offer better behaviour of the estimates. Statistical problems do not tend to be adversary and it is possible to give guarantees of convergence. Martin Wainwright, UC Berkeley: No point in optimizing beyond statistical precision. Local maximum within a range of the global solution are acceptable. Optimization for distributed data (and infrastructure). 14

16 Solution #4: Developing New Visualization Tools Two examples: 1. Papillio: Sheelagh Carpendale, U. of Calgary. 15

17 2. Sofia Olhede, UCL: Network histogram 16

18 Solution #5: Developing New Asymptotics The assumption that n while p is fixed is often violated. Classical results may not apply. New asymptotic results are not only useful to develop methodology, but they help understand better the structure and behaviour of large dimensional problems. 17

19 Big Data as a Game Changer Sallie Keller s analogy with Hubble: Big Data allows us to observe phenomenon that were always there, but that we could not observe with previous technologies. Applied sciences: the cost of research is shifting from data acquisition to data storage and analysis. Data as a resource: In Business or in Urban Analytics, data are a resource that you must exploit to remain competitive. Multidisciplinarity gives a big boost. 18

20 Statistics vs Computer Science The Computer Science community has developed infrastructure and tools that make Big Data possible. What can the statisticians bring? A bigger focus on inference. A good intuition on potential sources of bias. A good understanding of stochasticity. Strategies to deal with noise (vs signal). From Steeve Scott, Google: Statistician talk to human and the brain needs very low-dimensional input for interpretation, Computer scientists talk to computers for whom such low dimensional input is not a requirement. 19

21 Conclusion: A Few Words of Wisdom Knowledge and wisdom about inference is still valid. We should not dismiss what we already know because of the promises of Big Data. Big Data traps according to David Buckeridge: Hubris: Seeing big data as a solution in isolation, rather than as potential added value to existing methods and theory. Dazzle: Starting with the data and looking for problems, rather than defining a problem then finding the data. The hype around the term Big Data will probably fade, but the new challenges will remain. 20

Statistical Inference, Learning and Models for Big Data

Statistical Inference, Learning and Models for Big Data Statistical Inference, Learning and Models for Big Data Nancy Reid University of Toronto P.R. Krishnaiah Memorial Lecture 2015 Rao Prize Conference Penn State University May 15, 2015 P. R. Krishnaiah 1932

More information

Statistics, Big Data and Data Science!?

Statistics, Big Data and Data Science!? Statistics, Big Data and Data Science!? Prof. Dr. Göran Kauermann Ludwig-Maximilians-Universität Munich, Germany Statistics, Big Data and Data Science Statistics Founded around 1900 with the seminal work

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com

More information

Strategic Interaction and Markets

Strategic Interaction and Markets University Press Scholarship Online You are looking at 1-10 of 77 items for: keywords : general equilibrium Strategic Interaction and Markets Jean J. Gabszewicz Published in print: 2000 Published Online:

More information

Statistical Inference, Learning and Models in Big Data

Statistical Inference, Learning and Models in Big Data Statistical Inference, Learning and Models in Big Data Franke, Beate; Plante, Jean François; Roscher, Ribana; Lee, Annie; Smyth, Cathal; Hatefi, Armin; Chen, Fuqi; Gil, Einat; Schwing, Alexander; Selvitella,

More information

General overview, and sources and uses of Big Data for urban and regional analysis

General overview, and sources and uses of Big Data for urban and regional analysis General overview, and sources and uses of Big Data for urban and regional analysis Carson Farmer! @carsonfarmer " carsonfarmer.com # carson.farmer@hunter.cuny.edu TRB Executive Committee Policy Session,

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome

Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Enhance Collaboration and Data Sharing for Faster Decisions and Improved Mission Outcome Richard Breakiron Senior Director, Cyber Solutions Rbreakiron@vion.com Office: 571-353-6127 / Cell: 803-443-8002

More information

Inference from sub-nyquist Samples

Inference from sub-nyquist Samples Inference from sub-nyquist Samples Alireza Razavi, Mikko Valkama Department of Electronics and Communications Engineering/TUT Characteristics of Big Data (4 V s) Volume: Traditional computing methods are

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS.

HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation

Mobile Monetization Scenario Design & Big Data. Arther Wu Senior Director of Monetization and Business Operation Mobile Monetization Scenario Design & Big Data Arther Wu Senior Director of Monetization and Business Operation Agenda Quick update of Cheetah Mobile Ad Scenario Design Big Data / Relation with Advertising

More information

Tweets as big data. Rob Procter and Alex Voss. www.analysingsocialmedia.org. rob.procter@manchester.ac.uk alex.voss@st andrews.ac.

Tweets as big data. Rob Procter and Alex Voss. www.analysingsocialmedia.org. rob.procter@manchester.ac.uk alex.voss@st andrews.ac. Tweets as big data Rob Procter and Alex Voss www.analysingsocialmedia.org rob.procter@manchester.ac.uk alex.voss@st andrews.ac.uk SRA, December 10th 2012 1 Overview The qualitative data deluge A small

More information

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data

A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data White Paper A Visualization is Worth a Thousand Tables: How IBM Business Analytics Lets Users See Big Data Contents Executive Summary....2 Introduction....3 Too much data, not enough information....3 Only

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg The Big Picture on Big Data Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg Objective of Talk 1. Deliver a Primer on Big Data. 2. How does this emerging topic apply to Quality? 3.

More information

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014

Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014 What is Big Data? Of all the data in recorded human history, 90 percent has been created in the last two years. - Mark van Rijmenam, Think Bigger, 2014 Data in the Twentieth Century and before In 1663,

More information

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

More information

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank

Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Danny Wang, Ph.D. Vice President of Business Strategy and Risk Management Republic Bank Agenda» Overview» What is Big Data?» Accelerates advances in computer & technologies» Revolutionizes data measurement»

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Research of Postal Data mining system based on big data

Research of Postal Data mining system based on big data 3rd International Conference on Mechatronics, Robotics and Automation (ICMRA 2015) Research of Postal Data mining system based on big data Xia Hu 1, Yanfeng Jin 1, Fan Wang 1 1 Shi Jiazhuang Post & Telecommunication

More information

Better Decision Making

Better Decision Making Better Decision Making Big Data Analytics Webinar, November 2013 Dr. Wolfgang Martin Analyst and Member of the Boulder BI Brain Trust Better Decision Making Process Oriented Businesses. Decision Making:

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES

TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL TIME FOR LARGE DATABASES Techniques For Optimizing The Relationship Between Data Storage Space And Data Retrieval Time For Large Databases TECHNIQUES FOR OPTIMIZING THE RELATIONSHIP BETWEEN DATA STORAGE SPACE AND DATA RETRIEVAL

More information

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity EXECUTIVE REPORT Big Data and the 3 V s: Volume, Variety and Velocity The three V s are the defining properties of big data. It is critical to understand what these elements mean. The main point of the

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must

More information

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION

REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION REFLECTIONS ON THE USE OF BIG DATA FOR STATISTICAL PRODUCTION Pilar Rey del Castillo May 2013 Introduction The exploitation of the vast amount of data originated from ICT tools and referring to a big variety

More information

IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014

IDC MaturityScape Benchmark: Big Data and Analytics in Government. Adelaide O Brien Research Director IDC Government Insights June 20, 2014 IDC MaturityScape Benchmark: Big Data and Analytics in Government Adelaide O Brien Research Director IDC Government Insights June 20, 2014 IDC MaturityScape Benchmark: Big Data and Analytics in Government

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Big Data for Development: What May Determine Success or failure?

Big Data for Development: What May Determine Success or failure? Big Data for Development: What May Determine Success or failure? Emmanuel Letouzé letouze@unglobalpulse.org OECD Technology Foresight 2012 Paris, October 22 Swimming in Ocean of data Data deluge Algorithms

More information

Tackling Challenging Problems in Academia & Industry - An Interdisciplinary Approach

Tackling Challenging Problems in Academia & Industry - An Interdisciplinary Approach Tackling Challenging Problems in Academia & Industry - An Interdisciplinary Approach Dr. Kurt Stockinger Associate Professor (Dozent) of Computer Science Director of Studies in Data Science ZHAW Datalab

More information

IDC MaturityScape Benchmark: Big Data and Analytics in Government

IDC MaturityScape Benchmark: Big Data and Analytics in Government IDC MaturityScape Benchmark: Big Data and Analytics in Government Adelaide O Brien Research Director, IDC aobrien@idc.com Presentation to ACT-IAC Emerging Technology SIG July, 2014 IDC MaturityScape Benchmark:

More information

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait

CSC590: Selected Topics BIG DATA & DATA MINING. Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait CSC590: Selected Topics BIG DATA & DATA MINING Lecture 2 Feb 12, 2014 Dr. Esam A. Alwagait Agenda Introduction What is Big Data Why Big Data? Characteristics of Big Data Applications of Big Data Problems

More information

The? Data: Introduction and Future

The? Data: Introduction and Future The? Data: Introduction and Future Husnu Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies The Data Company Massive Data Unstructured Data Insight Information

More information

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data

Understanding Your Customer Journey by Extending Adobe Analytics with Big Data SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

BIG DATA: BIG BOOST TO BIG TECH

BIG DATA: BIG BOOST TO BIG TECH BIG DATA: BIG BOOST TO BIG TECH Ms. Tosha Joshi Department of Computer Applications, Christ College, Rajkot, Gujarat (India) ABSTRACT Data formation is occurring at a record rate. A staggering 2.9 billion

More information

Big Data and Marketing

Big Data and Marketing Big Data and Marketing Professor Venky Shankar Coleman Chair in Marketing Director, Center for Retailing Studies Mays Business School Texas A&M University http://www.venkyshankar.com venky@venkyshankar.com

More information

Big Data og Smart City. Knut H. H. Johansen CEO esmart System 7. mai 2015

Big Data og Smart City. Knut H. H. Johansen CEO esmart System 7. mai 2015 Big Data og Smart City Knut H. H. Johansen CEO esmart System 7. mai 2015 2 Smart Cities Big Data & Analytics Integrated Operations Smart City? No one definition for smart city > depends smartness comes

More information

BIG DATA FUNDAMENTALS

BIG DATA FUNDAMENTALS BIG DATA FUNDAMENTALS Timeframe Minimum of 30 hours Use the concepts of volume, velocity, variety, veracity and value to define big data Learning outcomes Critically evaluate the need for big data management

More information

BIG DATA: IT MAY BE BIG BUT IS IT SMART?

BIG DATA: IT MAY BE BIG BUT IS IT SMART? BIG DATA: IT MAY BE BIG BUT IS IT SMART? Turning Big Data into winning strategies A GfK Point-of-view 1 Big Data is complex Typical Big Data characteristics?#! %& Variety (data in many forms) Data in different

More information

Big Health Data the challenges and connections

Big Health Data the challenges and connections Big Data Big Health Data the challenges and connections Dr Trish Williams ehealth Research Group, School of Computer and Security Science, What are we looking at? Context Where to from here? Big Data Sources

More information

New Design Principles for Effective Knowledge Discovery from Big Data

New Design Principles for Effective Knowledge Discovery from Big Data New Design Principles for Effective Knowledge Discovery from Big Data Anjana Gosain USICT Guru Gobind Singh Indraprastha University Delhi, India Nikita Chugh USICT Guru Gobind Singh Indraprastha University

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

How is Big Data Different? A Paradigm Shift

How is Big Data Different? A Paradigm Shift How is Big Data Different? A Paradigm Shift Jennifer Clarke, Ph.D. Associate Professor Department of Statistics Department of Food Science and Technology University of Nebraska Lincoln ASA Snake River

More information

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS

BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS BIG DATA: BIG CHALLENGE FOR SOFTWARE TESTERS Megha Joshi Assistant Professor, ASM s Institute of Computer Studies, Pune, India Abstract: Industry is struggling to handle voluminous, complex, unstructured

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu

Modern (Computational) Approaches to Big Data Analytics. CSC 576 Computer Science, University of Rochester Instructor: Ji Liu Modern (Computational) Approaches to Big Data Analytics CSC 576 Computer Science, University of Rochester Instructor: Ji Liu Big Data in Academy SIGKDD 2014 (program page, found 14 big data, 50+ large

More information

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis

Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Getting the Most from Demographics: Things to Consider for Powerful Market Analysis Charles J. Schwartz Principal, Intelligent Analytical Services Demographic analysis has become a fact of life in market

More information

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms

More information

Can we Analyze all the Big Data we Collect?

Can we Analyze all the Big Data we Collect? DBKDA/WEB Panel 2015, Rome 28.05.2015 DBKDA Panel 2015, Rome, 27.05.2015 Reutlingen University Can we Analyze all the Big Data we Collect? Moderation: Fritz Laux, Reutlingen University, Germany Panelists:

More information

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource

More information

Big Data from a Database Theory Perspective

Big Data from a Database Theory Perspective Big Data from a Database Theory Perspective Martin Grohe Lehrstuhl Informatik 7 - Logic and the Theory of Discrete Systems A CS View on Data Science Applications Data System Users 2 Us Data HUGE heterogeneous

More information

BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA

BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA Harvard Medical School & Harvard School of Public Health sharon@hcp.med.harvard.edu October 14, 2014 1 / 7 THE SETTING Unprecedented advances in

More information

Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing. David Lechevalier Anantha Narayanan Sudarsan Rachuri

Towards a Domain-Specific Framework for Predictive Analytics in Manufacturing. David Lechevalier Anantha Narayanan Sudarsan Rachuri Towards a Framework for Predictive Analytics in Manufacturing David Lechevalier Anantha Narayanan Sudarsan Rachuri Outline 2 1. Motivation 1. Why Big in Manufacturing? 2. What is needed to apply Big in

More information

Five Questions to Ask Your Mobile Ad Platform Provider. Whitepaper

Five Questions to Ask Your Mobile Ad Platform Provider. Whitepaper June 2014 Whitepaper Five Questions to Ask Your Mobile Ad Platform Provider Use this easy, fool-proof method to evaluate whether your mobile ad platform provider s targeting and measurement are capable

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3

COM CO P 5318 Da t Da a t Explora Explor t a ion and Analysis y Chapte Chapt r e 3 COMP 5318 Data Exploration and Analysis Chapter 3 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping

More information

1.1 Research in Geography [Meaning & Importance]

1.1 Research in Geography [Meaning & Importance] Department of Geography GEO 271 Everything is related to everything else, but near things are more related than distant things. - Waldo Tobler s First Law of Geography 1.1 Research in Geography [Meaning

More information

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Predicting & Preventing Banking Customer Churn by Unlocking Big Data Predicting & Preventing Banking Customer Churn by Unlocking Big Data Making Sense of Big Data http://www.ngdata.com Predicting & Preventing Banking Customer Churn by Unlocking Big Data 1 Predicting & Preventing

More information

Customer Centric Banking. June 2014, IBU Banking, SAP

Customer Centric Banking. June 2014, IBU Banking, SAP Customer Centric Banking June 2014, IBU Banking, SAP EMPOWERED CUSTOMERS ARE 79% 53% 59% Digitally Connected of customers spend at least 50% of total shopping time researching brands online. Socially Networked

More information

A Strategic Approach to Unlock the Opportunities from Big Data

A Strategic Approach to Unlock the Opportunities from Big Data A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?

More information

Expected values, standard errors, Central Limit Theorem. Statistical inference

Expected values, standard errors, Central Limit Theorem. Statistical inference Expected values, standard errors, Central Limit Theorem FPP 16-18 Statistical inference Up to this point we have focused primarily on exploratory statistical analysis We know dive into the realm of statistical

More information

Collaborative Filtering. Radek Pelánek

Collaborative Filtering. Radek Pelánek Collaborative Filtering Radek Pelánek 2015 Collaborative Filtering assumption: users with similar taste in past will have similar taste in future requires only matrix of ratings applicable in many domains

More information

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Predicting & Preventing Banking Customer Churn by Unlocking Big Data Predicting & Preventing Banking Customer Churn by Unlocking Big Data Customer Churn: A Key Performance Indicator for Banks In 2012, 50% of customers, globally, either changed their banks or were planning

More information

DATA ANALYTICS USING R

DATA ANALYTICS USING R DATA ANALYTICS USING R Duration: 90 Hours Intended audience and scope: The course is targeted at fresh engineers, practicing engineers and scientists who are interested in learning and understanding data

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Is Big Data Bigger than a Bread Box?

Is Big Data Bigger than a Bread Box? Is Big Data Bigger than a Bread Box? Bradley Strauss Chitika, Inc. January 14, 2014 The Basic Problem The basic problem we face is simple to state: the big in big data is not well-defined, and perhaps

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich Big Data Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich Goal of Today What is Big Data? introduce all major buzz words What is not Big Data? get a feeling for opportunities & limitations Answering

More information

TRAINING SCHOOL IN EXPERIMENTAL DESIGN & STATISTICAL ANALYSIS OF BIOMEDICAL EXPERIMENTS

TRAINING SCHOOL IN EXPERIMENTAL DESIGN & STATISTICAL ANALYSIS OF BIOMEDICAL EXPERIMENTS TRAINING SCHOOL IN EXPERIMENTAL DESIGN & STATISTICAL ANALYSIS OF BIOMEDICAL EXPERIMENTS March 3 1 April 15 University of Coimbra, Portugal Supporters: CPD accreditation: FRAME delivers regular training

More information

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases

Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases Introduction The world is awash in data and turning that data into actionable

More information

Big Data in Pictures: Data Visualization

Big Data in Pictures: Data Visualization Big Data in Pictures: Data Visualization Huamin Qu Hong Kong University of Science and Technology What is data visualization? Data visualization is the creation and study of the visual representation of

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

Data Analytics in Organisations and Business

Data Analytics in Organisations and Business Data Analytics in Organisations and Business Dr. Isabelle E-mail: isabelle.flueckiger@math.ethz.ch 1 Data Analytics in Organisations and Business Some organisational information: Tutorship: Gian Thanei:

More information

How Master Data Management powers big data decision making.

How Master Data Management powers big data decision making. decision ready. How Master Data Management powers big data decision making. Building an enterprise architecture that s decision ready. Bringing discipline to big data. The trouble with insight is it doesn

More information

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction

New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network

Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network , pp.273-284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of Multi-Relationship Online Social Network Based on Multi-Subnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and

More information

UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business

UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business UNDERSTAND YOUR CLIENTS BETTER WITH DATA How Data-Driven Decision Making Improves the Way Advisors Do Business Executive Summary Financial advisors have long been charged with knowing the investors they

More information

"BIG DATA A PROLIFIC USE OF INFORMATION"

BIG DATA A PROLIFIC USE OF INFORMATION Ojulari Moshood Cameron University - IT4444 Capstone 2013 "BIG DATA A PROLIFIC USE OF INFORMATION" Abstract: The idea of big data is to better use the information generated by individual to remake and

More information

Big Data: Study in Structured and Unstructured Data

Big Data: Study in Structured and Unstructured Data Big Data: Study in Structured and Unstructured Data Motashim Rasool 1, Wasim Khan 2 mail2motashim@gmail.com, khanwasim051@gmail.com Abstract With the overlay of digital world, Information is available

More information

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Turning Big Data into Big Decisions Delivering on the High Demand for Data

Turning Big Data into Big Decisions Delivering on the High Demand for Data Turning Big Data into Big Decisions Delivering on the High Demand for Data Michael Ho, Vice President of Professional Services Digital Government Institute s Government Big Data Conference, October 31,

More information

SURVEY REPORT DATA SCIENCE SOCIETY 2014

SURVEY REPORT DATA SCIENCE SOCIETY 2014 SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses

More information

Data Science Will computer science and informatics eat our lunch?

Data Science Will computer science and informatics eat our lunch? Data Science Will computer science and informatics eat our lunch? Thomas Lumley University of Auckland (g)tslumley statschat.org.nz notstat schat.tumblr.com In the 1920s, the computing labs helped establish

More information

The Must Dos of Your Digital Strategy

The Must Dos of Your Digital Strategy Welcome The Must Dos of Your Digital Strategy Antonie Geerts Managing Director, Seditio Digital Consultancy @AntonieGeerts 2 About Seditio Putting our Clients first Digital Consultancy & Training We want

More information

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction Data Mining and Exploration Data Mining and Exploration: Introduction Amos Storkey, School of Informatics January 10, 2006 http://www.inf.ed.ac.uk/teaching/courses/dme/ Course Introduction Welcome Administration

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Big Picture of Big Data Software Engineering With example research challenges

Big Picture of Big Data Software Engineering With example research challenges Big Picture of Big Data Software Engineering With example research challenges Nazim H. Madhavji, UWO, Canada Andriy Miranskyy, Ryerson U., Canada Kostas Kontogiannis, NTUA, Greece madhavji@gmail.com avm@ryerson.ca

More information

Big Data in Healthcare: Myth, Hype, and Hope

Big Data in Healthcare: Myth, Hype, and Hope Big Data in Healthcare: Myth, Hype, and Hope Woojin Kim, MD Insert Organization Logo Here or Remove Disclosure Co-founder/Shareholder Montage Healthcare Solutions, Inc Consultant Infiniti Medical, LLC

More information

Why Big Data is not Big Hype in Economics and Finance?

Why Big Data is not Big Hype in Economics and Finance? Why Big Data is not Big Hype in Economics and Finance? Ariel M. Viale Marshall E. Rinker School of Business Palm Beach Atlantic University West Palm Beach, April 2015 1 The Big Data Hype 2 Big Data as

More information

Big Analytics: A Next Generation Roadmap

Big Analytics: A Next Generation Roadmap Big Analytics: A Next Generation Roadmap Cloud Developers Summit & Expo: October 1, 2014 Neil Fox, CTO: SoftServe, Inc. 2014 SoftServe, Inc. Remember Life Before The Web? 1994 Even Revolutions Take Time

More information