Healthcare data analytics. Da-Wei Wang Institute of Information Science

Size: px
Start display at page:

Download "Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw"

Transcription

1 Healthcare data analytics Da-Wei Wang Institute of Information Science

2 Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion

3

4

5 Analytics Statistics Machine learning decision tree, artificial neural network, support vector machine, Bayesian network Deep learning Graph analytics Natural language processing

6 Map-Reduce Programming model for large-scale computing problems Parallel and distributed computing

7 1. Distribute data to machine (mapper) 2. Map: computing something you want from each data item (key, value) pair 3. Shuffle and Sort (according to key) 4. Reduce: aggregate, summarize, filter, or transform (reducer) 5. Output

8 Compute word frequency 1. Distribute web pages to machines (mapper) 2. Map: for each word, w, create (w, c) pair where c is the number of occurrence of w in the document 3. Shuffle and Sort (according to key) 4. Reduce: add all c_i in pair (w, c_i) 5. Output

9 Visualization main goal of data visualization is to communicate information clearly and effectively through graphical means To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key-aspects in a more intuitive way Example: Hans Rosling, gapminder

10

11 Heterogeneity in Healthcare Multiple forms insurance claims, physician notes, images conversations about health in social media data from wearables and other monitoring devices. Multiple agencies: Providers, payers, employers, personalizedgenetic-testing companies (23andme), social media, and patients

12 The Learning Healthcare System Series

13 The goal of a learning healthcare system is to deliver the best care every time, and to learn and improve with each care experience Each care experience counts implies massive data Need analytics

14 Precision Medicine Precision medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person. Electronic health records have been widely adopted, genomic analysis costs have dropped significantly, data science has become increasingly sophisticated

15 Precision medicine initiative Mr. Obama called for $215 million in fiscal year 2016 to support the Initiative(2015/1) $130 million was allocated to NIH to build a national, large-scale research participant group, called a cohort $70 million was allocated to the National Cancer Institute to lead efforts in cancer genomics

16 Not only for profit

17 Issues with big data analytics Over fitting Model complexity Association (correlation) v.s. causality Understanding, explanation v.s. predicting Parametric to non-parametric Equational model to algorithmic model Wolfgang Pietsch Big Data The New Science of Complexity

18 Cautious notes Google flu trend Detecting influenza epidemics using search engine query data Nature 2009 (letters) When google got flu wrong Nature 2013 (news) The parable of google flu: traps in big data analysis Science 2014 (policy forum)

19 Google Flu Trend Early detection -> rapid response -> reduced impact Monitor health-seeking behavior in the form of online web search queries Relative frequency of certain queries is highly correlated with the percentage of physician visits Estimate current level of weekly influenza activity

20 Data: hundreds of billions of individual searches logs (03-08), time series of weekly counts for 50 million most common search queries normalized by dividing total number of queries Percentage of ILI-related physician visit data from CDC Goal: to estimate the percentage of influenza like illness (ILI)

21 Estimate the probability, P, that a random physician visit is influenza-like illness related Key insight: the probability, Q, that a random search query is ILI-related can approximate P Next steps: Pick a model to relate P with Q Determine ILI-related query

22 Logit(P)= a + b*logit(q)+e, Logit(x)= ln(x/1-x) P, Q? Training step: select the set of ILI-related queries (Q) that fits the model best

23 Single query as Q, try 50 millions one by one. Favor those performed well for all 9 regions. (9 regions) Produce a sorted list of highest scoring queries. Decide how many queries to be included in Q. N=45

24 results Training Meaning correlation 0.9 (min=0.8, max=0.96, 9 regions) Validating: 42 points for each region ( ) 0.97 (min=0.92, max=0.99)

25 When Google got flu wrong Not doing well for 2012 season 2009 flu trend badly underestimated ILI in the US at the start of the H1N1 pandemic Attributed to changes in people s search behaviors as a result of the exceptional nature of the pandemic

26 The most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis 50 million search terms to fit 1152 data points Remedy: combining multiple sources and dynamically recalibrating GFT

27 Algorithms dynamics All empirical research stands on a foundation of measurement. Is the instrumentation actually capturing the theoretical construct of interest? In the measurement stable and comparable across cases and over time? Are measurement errors systematic? GFT was an unstable reflection of the prevalence of the flu because of algorithm dynamics affecting google s search algorithm

28 Algorithm dynamics Algorithm dynamics are the changes made by engineers to improve the commercial service and by consumers in using that service The google search algorithm is not a static entity Providing suggested additional search terms (2011) Returning potential diagnoses for searches including physical symptoms (2012)

29 GFT assumes that relative search volume for certain terms is statically related to external events, but search behavior changes dynamically Research subjects attempt to manipulate the data generating process to meet their own goals. (google bomb) Ironically, the more successful we become at monitoring the behavior of people using these open sources of information, the more tempting it will be to manipulate those signals.

30 lessons Transparency and replicability Use big data to understand the unknown GFT for finer granularity Study the algorithms Robust patterns? Replicate across time, with other data source Study evolution of social-technical system embedded in our society. It s not just about size of the data all data

31 健 康 存 摺 與 電 子 病 歷 交 換 中 心 已 經 站 上 了 learning healthcare system 的 起 跑 點

32 防 疫 雲 開 始 嘗 試 machine to machine 自 動 資 料 交 換 使 傳 染 病 監 控 更 即 時 更 經 濟

33 健 康 雲 跨 領 域 研 究 法 律 經 濟 生 醫 公 衛 統 計 資 訊 希 望 創 造 更 尊 重 個 人 且 有 善 的 研 究 環 境

34 Privacy Dispute about National Health Insurance data:not only personal privacy, also autonomy. The right to opt-out? Data de-identified, opt-out reduces the quality of data, it s for public good, administration cost too high It s my decision! IT brings administration cost down What if 30% opt-out, data quality down. But

35 Releasing data Data -> User De-identification (cellsecu system) Data enclave( 資 料 中 心 ) User -> Data Link unlinkable data sets Secure multiparty computation

36 Dataset Linkage problem Linking several dataset can be very useful Linkage is prohibited by law in many places due to privacy concerns Secure multiparty computation (SMC) protocols might remedy the situation We built a prototype system

37 Conclusions Data science has tremendous potential Healthcare analytics can have profound impact on healthcare systems Autonomy and privacy issues have to be addressed 主 動 參 與 是 可 能 的 選 項

What is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume.

What is Big Data? The three(or four) Vs in Big Data In 2013 the total amount of stored information is estimated to be Volume. 8/26/2014 CS581 Big Data - Fall 2014 1 8/26/2014 CS581 Big Data - Fall 2014 2 CS535/CS581A BIG DATA What is Big Data? PART 0. INTRODUCTION 1. INTRODUCTION TO BIG DATA 2. COURSE INTRODUCTION PART 0. INTRODUCTION

More information

Big Data Processing with Google s MapReduce. Alexandru Costan

Big Data Processing with Google s MapReduce. Alexandru Costan 1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:

More information

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD DEFINITION OF BIG DATA Big data is a broad term for data sets so large or complex that traditional data processing applications

More information

ANALYTICS PREDICTIVE. Tool of Providence or the End of Coincidence? He who does not expect the unexpected will not find it out.

ANALYTICS PREDICTIVE. Tool of Providence or the End of Coincidence? He who does not expect the unexpected will not find it out. PREDICTIVE ANALYTICS Tool of Providence or the End of Coincidence? He who does not expect the unexpected will not find it out. Unless you expect the unexpected you will ever find truth, for it is hard

More information

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia

The Impact of Big Data on Classic Machine Learning Algorithms. Thomas Jensen, Senior Business Analyst @ Expedia The Impact of Big Data on Classic Machine Learning Algorithms Thomas Jensen, Senior Business Analyst @ Expedia Who am I? Senior Business Analyst @ Expedia Working within the competitive intelligence unit

More information

Exploration and Visualization of Post-Market Data

Exploration and Visualization of Post-Market Data Exploration and Visualization of Post-Market Data Jianying Hu, PhD Joint work with David Gotz, Shahram Ebadollahi, Jimeng Sun, Fei Wang, Marianthi Markatou Healthcare Analytics Research IBM T.J. Watson

More information

Introduction to Data Visualization

Introduction to Data Visualization Introduction to Data Visualization STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/teaching/stat133 Graphics

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Journée Thématique Big Data 13/03/2015

Journée Thématique Big Data 13/03/2015 Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets

More information

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Game Changers for Researchers: Altmetrics, Big Data, Open Access What Might They Change? Kiki Forsythe, M.L.S.

Game Changers for Researchers: Altmetrics, Big Data, Open Access What Might They Change? Kiki Forsythe, M.L.S. Game Changers for Researchers: Altmetrics, Big Data, Open Access What Might They Change? Kiki Forsythe, M.L.S. Definition of Game Changer A newly introduced element or factor that changes an existing situation

More information

Secondary Uses of Data for Comparative Effectiveness Research

Secondary Uses of Data for Comparative Effectiveness Research Secondary Uses of Data for Comparative Effectiveness Research Paul Wallace MD Director, Center for Comparative Effectiveness Research The Lewin Group Paul.Wallace@lewin.com Disclosure/Perspectives Training:

More information

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof. CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed

More information

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines

Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process

More information

HIV NOMOGRAM USING BIG DATA ANALYTICS

HIV NOMOGRAM USING BIG DATA ANALYTICS HIV NOMOGRAM USING BIG DATA ANALYTICS S.Avudaiselvi and P.Tamizhchelvi Student Of Ayya Nadar Janaki Ammal College (Sivakasi) Head Of The Department Of Computer Science, Ayya Nadar Janaki Ammal College

More information

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Predicting & Preventing Banking Customer Churn by Unlocking Big Data Predicting & Preventing Banking Customer Churn by Unlocking Big Data Making Sense of Big Data http://www.ngdata.com Predicting & Preventing Banking Customer Churn by Unlocking Big Data 1 Predicting & Preventing

More information

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com)

Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) Hadoop Usage At Yahoo! Milind Bhandarkar (milindb@yahoo-inc.com) About Me Parallel Programming since 1989 High-Performance Scientific Computing 1989-2005, Data-Intensive Computing 2005 -... Hadoop Solutions

More information

Data Analytics for Healthcare: Creating understanding from big data

Data Analytics for Healthcare: Creating understanding from big data Data Analytics for Healthcare: Creating understanding from big data Data Analytics for Healthcare Data analytics is an essential resource for any profession. This collection of data and information is

More information

FOREIGN AFFAIRS PROGRAM EVALUATION GLOSSARY CORE TERMS

FOREIGN AFFAIRS PROGRAM EVALUATION GLOSSARY CORE TERMS Activity: A specific action or process undertaken over a specific period of time by an organization to convert resources to products or services to achieve results. Related term: Project. Appraisal: An

More information

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop

Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE

More information

Big Data Analytics for Healthcare

Big Data Analytics for Healthcare Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo

Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Predicting & Preventing Banking Customer Churn by Unlocking Big Data

Predicting & Preventing Banking Customer Churn by Unlocking Big Data Predicting & Preventing Banking Customer Churn by Unlocking Big Data Customer Churn: A Key Performance Indicator for Banks In 2012, 50% of customers, globally, either changed their banks or were planning

More information

EHR Surveillance for Seasonal and Pandemic Influenza in Primary Care Settings

EHR Surveillance for Seasonal and Pandemic Influenza in Primary Care Settings EHR Surveillance for Seasonal and Pandemic Influenza in Primary Care Settings Jonathan L. Temte, MD/PhD Chuck Illingworth University of Wisconsin School of Medicine and Public Health Department of Family

More information

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA

CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA CLASSIFYING NETWORK TRAFFIC IN THE BIG DATA ERA Professor Yang Xiang Network Security and Computing Laboratory (NSCLab) School of Information Technology Deakin University, Melbourne, Australia http://anss.org.au/nsclab

More information

Formal Methods for Preserving Privacy for Big Data Extraction Software

Formal Methods for Preserving Privacy for Big Data Extraction Software Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability

More information

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics

Visual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Differential privacy in health care analytics and medical research An interactive tutorial

Differential privacy in health care analytics and medical research An interactive tutorial Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could

More information

GLOSSARY OF EVALUATION TERMS

GLOSSARY OF EVALUATION TERMS Planning and Performance Management Unit Office of the Director of U.S. Foreign Assistance Final Version: March 25, 2009 INTRODUCTION This Glossary of Evaluation and Related Terms was jointly prepared

More information

MS1b Statistical Data Mining

MS1b Statistical Data Mining MS1b Statistical Data Mining Yee Whye Teh Department of Statistics Oxford http://www.stats.ox.ac.uk/~teh/datamining.html Outline Administrivia and Introduction Course Structure Syllabus Introduction to

More information

Prediction of Heart Disease Using Naïve Bayes Algorithm

Prediction of Heart Disease Using Naïve Bayes Algorithm Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Why dread a bump on the head?

Why dread a bump on the head? Why dread a bump on the head? The neuroscience of traumatic brain injury Lesson 6: Exploring the data behind brain injury I. Overview This lesson exposes students to the role data access and analysis can

More information

Societal Data Resources and Data Processing Infrastructure

Societal Data Resources and Data Processing Infrastructure Societal Data Resources and Data Processing Infrastructure Bruno Martins INESC-ID & Instituto Superior Técnico bruno.g.martins@ist.utl.pt 1 DATASTORM Task on Societal Data Project vision : Build infrastructure

More information

Sanjeev Kumar. contribute

Sanjeev Kumar. contribute RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a

More information

Big Data Analytics- Innovations at the Edge

Big Data Analytics- Innovations at the Edge Big Data Analytics- Innovations at the Edge Brian Reed Chief Technologist Healthcare Four Dimensions of Big Data 2 The changing Big Data landscape Annual Growth ~100% Machine Data 90% of Information Human

More information

Opportunities and Limitations of Big Data

Opportunities and Limitations of Big Data Opportunities and Limitations of Big Data Karl Schmedders University of Zurich and Swiss Finance Institute «Big Data: Little Ethics?» HWZ-Darden-Conference June 4, 2015 On fortune.com this morning: Apple's

More information

Leveraging Big Data for the Next Generation of Health Care Ken Cunningham, VP Analytics Pam Jodock, Director Business Development

Leveraging Big Data for the Next Generation of Health Care Ken Cunningham, VP Analytics Pam Jodock, Director Business Development Leveraging Big Data for the Next Generation of Health Care Ken Cunningham, VP Analytics Pam Jodock, Director Business Development December 6, 2012 Health care spending to Reach 20% of U.S. Economy by 2020

More information

Foundation of Quantitative Data Analysis

Foundation of Quantitative Data Analysis Foundation of Quantitative Data Analysis Part 1: Data manipulation and descriptive statistics with SPSS/Excel HSRS #10 - October 17, 2013 Reference : A. Aczel, Complete Business Statistics. Chapters 1

More information

Introduction to Hadoop and MapReduce

Introduction to Hadoop and MapReduce Introduction to Hadoop and MapReduce THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Large-scale Computation Traditional solutions for computing large quantities of data

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009

Using Predictions to Power the Business. Wayne Eckerson Director of Research and Services, TDWI February 18, 2009 Using Predictions to Power the Business Wayne Eckerson Director of Research and Services, TDWI February 18, 2009 Sponsor 2 Speakers Wayne Eckerson Director, TDWI Research Caryn A. Bloom Data Mining Specialist,

More information

A bit about Hadoop. Luca Pireddu. March 9, 2012. CRS4Distributed Computing Group. luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18

A bit about Hadoop. Luca Pireddu. March 9, 2012. CRS4Distributed Computing Group. luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18 A bit about Hadoop Luca Pireddu CRS4Distributed Computing Group March 9, 2012 luca.pireddu@crs4.it (CRS4) Luca Pireddu March 9, 2012 1 / 18 Often seen problems Often seen problems Low parallelism I/O is

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Societal benefits vs. privacy: what distributed secure multi-party computation enable? Research ehelse 2015 21-22 April Oslo

Societal benefits vs. privacy: what distributed secure multi-party computation enable? Research ehelse 2015 21-22 April Oslo Privacy Societal benefits vs. privacy: what distributed secure multi-party computation enable? Research ehelse 2015 21-22 April Oslo Kassaye Yitbarek Yigzaw UiT The Arctic University of Norway Outline

More information

HPC ABDS: The Case for an Integrating Apache Big Data Stack

HPC ABDS: The Case for an Integrating Apache Big Data Stack HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox gcf@indiana.edu http://www.infomall.org

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

RISK MANAGEMENT HEALTH CARE

RISK MANAGEMENT HEALTH CARE RISK MANAGEMENT HEALTH CARE Level: Grades 9-12. Purpose: The purpose is to identify and investigate health care issues so that students maintain good health. Content Standards: This unit covers Science

More information

Pentaho Data Mining Last Modified on January 22, 2007

Pentaho Data Mining Last Modified on January 22, 2007 Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org

More information

Big Data and Scripting map/reduce in Hadoop

Big Data and Scripting map/reduce in Hadoop Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb

More information

Data Analytics in Health Care

Data Analytics in Health Care Data Analytics in Health Care ONUP 2016 April 4, 2016 Presented by: Dennis Giokas, CTO, Innovation Ecosystem Group A lot of data, but limited information 2 Data collection might be the single greatest

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference.

How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference. How can you unlock the value in real-world data? A novel approach to predictive analytics could make the difference. What if you could diagnose patients sooner, start treatment earlier, and prevent symptoms

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov

Data Clustering. Dec 2nd, 2013 Kyrylo Bessonov Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

De-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire

De-identification, defined and explained. Dan Stocker, MBA, MS, QSA Professional Services, Coalfire De-identification, defined and explained Dan Stocker, MBA, MS, QSA Professional Services, Coalfire Introduction This perspective paper helps organizations understand why de-identification of protected

More information

Intro to Bioinformatics

Intro to Bioinformatics Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate

More information

Visualization methods for patent data

Visualization methods for patent data Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# Course Goals" This Lecture" Brief History of Data Analysis" Big Data and Data Science Why All the Excitement?" Where Big Data Comes From" Course

More information

Big Data Analytics and Healthcare

Big Data Analytics and Healthcare Big Data Analytics and Healthcare Anup Kumar, Professor and Director of MINDS Lab Computer Engineering and Computer Science Department University of Louisville Road Map Introduction Data Sources Structured

More information

Ethical and Public Health Implications of Genomic and Personalised Medicine. Dr Ingrid Slade

Ethical and Public Health Implications of Genomic and Personalised Medicine. Dr Ingrid Slade Ethical and Public Health Implications of Genomic and Personalised Medicine Dr Ingrid Slade Ethical and Public Health Implications of Genomic and Personalised Medicine What is personalised medicine and

More information

Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD

Speaker First Plenary Session THE USE OF BIG DATA - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Optum Labs Cambridge, MA, USA Statistical Methods and Machine Learning ISPOR International

More information

CORPORATE OVERVIEW. Big Data. Shared. Simply. Securely.

CORPORATE OVERVIEW. Big Data. Shared. Simply. Securely. CORPORATE OVERVIEW Big Data. Shared. Simply. Securely. INTRODUCING PHEMI SYSTEMS PHEMI unlocks the power of your data with out-of-the-box privacy, sharing, and governance PHEMI Systems brings advanced

More information

Executive Briefing White Paper Plant Performance Predictive Analytics

Executive Briefing White Paper Plant Performance Predictive Analytics Executive Briefing White Paper Plant Performance Predictive Analytics A Data Mining Based Approach Abstract The data mining buzzword has been floating around the process industries offices and control

More information

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI

Certificate Program in Applied Big Data Analytics in Dubai. A Collaborative Program offered by INSOFE and Synergy-BI Certificate Program in Applied Big Data Analytics in Dubai A Collaborative Program offered by INSOFE and Synergy-BI Program Overview Today s manager needs to be extremely data savvy. They need to work

More information

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley

WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley WROX Certified Big Data Analyst Program by AnalytixLabs and Wiley Disclaimer: This material is protected under copyright act AnalytixLabs, 2011. Unauthorized use and/ or duplication of this material or

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning. Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski trajkovski@nyus.edu.mk Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training

More information

An Overview of Modeling Bottlenecks in Healthcare Delivery

An Overview of Modeling Bottlenecks in Healthcare Delivery An Overview of Modeling Bottlenecks in Healthcare Delivery Ozgur M Araz PhD Information Risk and Operations Management Department September 13 th 2010 Houston Introduction Notes from: Building a better

More information

Signal and Information Processing

Signal and Information Processing The Fu Foundation School of Engineering and Applied Science Department of Electrical Engineering COLUMBIA UNIVERSITY IN THE CITY OF NEW YORK Signal and Information Processing Prof. John Wright SIGNAL AND

More information

The Evolvement of Big Data Systems

The Evolvement of Big Data Systems The Evolvement of Big Data Systems From the Perspective of an Information Security Application 2015 by Gang Chen, Sai Wu, Yuan Wang presented by Slavik Derevyanko Outline Authors and Netease Introduction

More information

Making Critical Connections: Predictive Analytics in Government

Making Critical Connections: Predictive Analytics in Government Making Critical Connections: Predictive Analytics in Improve strategic and tactical decision-making Highlights: Support data-driven decisions. Reduce fraud, waste and abuse. Allocate resources more effectively.

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Electronic Oral Health Risk Assessment Tools

Electronic Oral Health Risk Assessment Tools SCDI White Paper No. 1074 Approved by ADA Council on Dental Practice May 2013 ADA SCDI White Paper No. 1074 Electronic Oral Health Risk Assessment Tools 2013 Copyright 2013 American Dental Association.

More information

Log Mining Based on Hadoop s Map and Reduce Technique

Log Mining Based on Hadoop s Map and Reduce Technique Log Mining Based on Hadoop s Map and Reduce Technique ABSTRACT: Anuja Pandit Department of Computer Science, anujapandit25@gmail.com Amruta Deshpande Department of Computer Science, amrutadeshpande1991@gmail.com

More information

Big Data a threat or a chance?

Big Data a threat or a chance? Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Energy Efficient MapReduce

Energy Efficient MapReduce Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 MEDICAL DATA MINING Timothy Hays, PhD Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012 2 Healthcare in America Is a VERY Large Domain with Enormous Opportunities for Data

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Version 1.0. HEAL NY Phase 5 Health IT & Public Health Team. Version Released 1.0. HEAL NY Phase 5 Health

Version 1.0. HEAL NY Phase 5 Health IT & Public Health Team. Version Released 1.0. HEAL NY Phase 5 Health Statewide Health Information Network for New York (SHIN-NY) Health Information Exchange (HIE) for Public Health Use Case (Patient Visit, Hospitalization, Lab Result and Hospital Resources Data) Version

More information

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

CS 2750 Machine Learning. Lecture 1. Machine Learning.  CS 2750 Machine Learning. Lecture 1 Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x-5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott

More information

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013

Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Big Data & Analytics: Your concise guide (note the irony) Wednesday 27th November 2013 Housekeeping 1. Any questions coming out of today s presentation can be discussed in the bar this evening 2. OCF is

More information

Is Big Data Good for our Health? You Bet. Here s Why. By Cameron Warren and Merav Yuravlivker

Is Big Data Good for our Health? You Bet. Here s Why. By Cameron Warren and Merav Yuravlivker Is Big Data Good for our Health? You Bet. Here s Why. By Cameron Warren and Merav Yuravlivker The term Big Data is increasingly used in our everyday lives. But each mention of it means something different,

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26 Big Data and Privacy Fritz Henglein Dept. of Computer Science, University of Copenhagen Finance IT Day Riga, 2015-03-26 About me Professor, Programming Languages and Systems, University of Copenhagen Director,

More information

(Big) Data Analytics: From Word Counts to Population Opinions

(Big) Data Analytics: From Word Counts to Population Opinions (Big) Data Analytics: From Word Counts to Population Opinions Mark Keane Insight@University College Dublin October 2014 ~ RSS ~ Edinburgh September 2014/EPIC 2 September 2014/EPIC 3 September 2014/EPIC

More information