Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD
|
|
- Sybil Golden
- 8 years ago
- Views:
Transcription
1 Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Optum Labs Cambridge, MA, USA Statistical Methods and Machine Learning ISPOR International Meeting, Montreal, Canada
2 Overview Explosion in Data Availability Traditional Methods for Analyzing Observational Data Machine Learning Methods Widely used outside of health care especially in consumer retail Many methods Model development and testing approach Is more data better? Traditional focus on prediction versus estimation of treatment effects How Can We Find the Best of Both? Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 3 The Growing Availability of Data
3 Market Context Velocity Complexity Tests and Treatments (Medical, Lab, Pharmacy Claims, Standardized Costs) Health Risk Assessments Socioeconomic (Race, Income, Education, Language, ) Vital Signs Medication Orders Admissions, Discharges, Transfers Patient Health Survey (PHQ-9) Health Survey Measurement (SF-12, SF-36) Care Coaching Engagements Evidence Based Medicine (Recommended Care Pathways) Mobile Applications / Social Networking Medical Research Genomic Volume Future Variety Gartner model, adapted Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 5 Examples of Data Partnerships m Multi-Stakeholder Life Sciences/Data and Analytics PCORI CDRN PCORI PCORnet FDA Sentinel m Government Life Sciences/Payer Delivery System/Partners Payer/IT Life Sciences/PBM Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 6
4 Traditional Health Services Research and Epidemiological Methods Statistical Analysis of Observational Data Good methods for developing well-matched control groups but no magic bullets--e.g., propensity score. These methods control only for observables. Do not control for endogeneity or confounding. Johnson, M., Crown, W., Martin, B., Dormuth, C., Siebert U Good Research Practices for Comparative Effectiveness Research: Analytic Methods to Improve Causal Inference from Nonrandomized Studies of Treatment Effects Using Secondary Data Sources. Report of the ISPOR Retrospective Database Analysis Task Force Part III. Value in Health 12(8): Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 8
5 Machine Learning Methods Machine Learning Methods Many methods: Classification Trees Neural Networks Random Forests Ridge and Lasso Regression Support Vector Machines And many others Hastie T., Tibshirani R., Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2 nd Edition. New York: Springer. Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 10
6 Basic Approach Use learning datasets to develop highly accurate classification algorithm. Apply algorithm to another dataset to predict classification. Rules should be as simple as possible while maintaining accuracy. Should be able to classify data without human intervention Should be efficient with very large datasets Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 11 Rob Schapire. Machine Learning Algorithms for Classification.
7 K-Fold Cross-Validation Randomly divide the full dataset into learning/validation datasets Randomly divide the learning/validation data into K equal subsamples (typically 5 or 10) For each subsample K, fit the data using the other K-1 subsamples Estimate the prediction error (e.g., sum of squared errors) for subsample K using the models estimated from the other K-1 subsamples Pick the model specification that generates the lowest average cross validation error Estimate the final model using the full dataset Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 13 Classification and Regression Trees Advantages Easily handle huge datasets Can include both qualitative and quantitative predictor variables Very good for missing or sparse data Small trees are easy to interpret Disadvantages Large trees are difficult to interpret Overall prediction performance tends to be poor Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 14
8 Rob Schapire. Machine Learning Algorithms for Classification. Rob Schapire. Machine Learning Algorithms for Classification.
9 Approach Pick a rule to subset data Using the rule, divide data into subsets Keep repeating until remaining subsets are almost pure (e.g, measured by entropy or gini index) Usual approach is to build a very large tree and then prune it back Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 17 Neural Networks Y 1 Y 2 Outcome Layer Y i =e z /(1+e z ) Z 1 Z 2 Z 3 Z 4 Hidden Layer Z i = f(b k X k ) X 1 X 2 X 3 Input Layer Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 18
10 Prediction Is Not the Same as Estimating Treatment Effects Some machine learning methods (e.g., Ridge and Lasso regression) use regression methods with a penalty term to adjust for the danger of overfitting. Enables application of the machine learning approach to the estimation of treatment effects. But we know that results from observational studies can be sensitive to spurious correlations and methodological approach Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 19 Things with strong trends will tend to be highly correlated (1) Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 20
11 Things With Strong Trends Will Tend to Be Highly Correlated (2) Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 21 Small Sample Sizes Can Generate Some Really Weird Findings Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 22
12 But Even With Big Data You Have To Be Careful! MI Outcome (Unmatched) MI Outcome (After Matching) HR=2.11 ( ) 111% (46%-204%) Risk Increase HR=0.69 ( ) 31% (7%-48%) Risk Reduction Cumulative Incidence Statin Initiators Statin Non-Initiators Cumulative Incidence Statin Non-Initiators Statin Initiators Months of Follow-Up Months of Follow-Up Seeger, John, Alexander Walker, Paige Williams, Gordon Saperia, Frank Sacks (2003) A Propensity Score-Matched Cohort Study of the Effect of Statins, Mainly Fluvastatin, on the Occurrence of Acute Myocardial Infarction. Am J. Cardiol 92: Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 23 Bigger Samples Don t Reduce Bias N=200 N=10,000 (,z)=0 (z,e)= Estimation Error iv ols iv ols Crown, W., Henk, H., VanNess D. Some Cautions on the Use of Instrumental Variables (IV) Estimators in Outcomes Research: How Bias in IV Estimators is Affected by Instrument Strength, Instrument Contamination, and Sample Size. Value in Health 14: , Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 24
13 EHR/Claims Linkages Can Help Reduce Missing Variable Bias Relevant information Claims alone EHR alone Linked data Clinical data and severity measures + + Retail/specialty drugs across treatment settings + + Leakage + + Patient-reported outcomes Selection biases due to payer type + + Longitudinality of patient follow-up + ++ Self-pay data + + Coding biases + + Unstructured data + + Timing of events + + Continuous coverage + ++ Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 25 Data from disparate sources can be linked and de-identified Name Address Birthdate SSN Phone, etc. Direct identifiers (EMR / Clinical) Primary hash Shared Salt Code (same for all contributors) Data is then hashed by contributors at their site. Name Address Birthdate Member ID Phone, etc. Direct identifiers (Insurer example) Primary hash Secondary hash Uses Confidential Salt De-identification Statistically de-identified views Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 26
14 Summary Rapid expansion in data (volume, velocity, and variety) Machine learning approaches focus on prediction but some can also be used to estimate treatment effects Machine learning methods offer opportunities for speed to answer but traditional challenges with observational data do not go away More data doesn t help with bias problems unless it helps with control variables through data linkage For treatment effect estimation still need to think about possible sources of bias and their implications for methodology and data used for model building Confidential property of Optum. Do not distribute or reproduce without express permission from Optum. 27 Speaker First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? William H. Crown, PhD Optum Labs Cambridge, MA, USA
HIPAA and Big Data Twenty Third National HIPAA Summit. March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer
HIPAA and Big Data Twenty Third National HIPAA Summit March 17, 2015 Mitchell W. Granberg, Optum Chief Privacy Officer Overview HIPAA and Big Data Big Data Definitions Big Data and Health Care Benefits
More informationPREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA
PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:
More informationData Mining. Nonlinear Classification
Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15
More informationFULL COVERAGE FOR PREVENTIVE MEDICATIONS AFTER MYOCARDIAL INFARCTION IMPACT ON RACIAL AND ETHNIC DISPARITIES
FULL COVERAGE FOR PREVENTIVE MEDICATIONS AFTER MYOCARDIAL INFARCTION IMPACT ON RACIAL AND ETHNIC DISPARITIES Niteesh K. Choudhry, MD, PhD Harvard Medical School Division of Pharmacoepidemiology and Pharmacoeconomics
More informationData Mining Practical Machine Learning Tools and Techniques
Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea
More informationSecondary Uses of Data for Comparative Effectiveness Research
Secondary Uses of Data for Comparative Effectiveness Research Paul Wallace MD Director, Center for Comparative Effectiveness Research The Lewin Group Paul.Wallace@lewin.com Disclosure/Perspectives Training:
More informationLecture 13: Validation
Lecture 3: Validation g Motivation g The Holdout g Re-sampling techniques g Three-way data splits Motivation g Validation techniques are motivated by two fundamental problems in pattern recognition: model
More informationBig Data Analytics for SCADA
ENERGY Big Data Analytics for SCADA Machine Learning Models for Fault Detection and Turbine Performance Elizabeth Traiger, Ph.D., M.Sc. 14 April 2016 1 SAFER, SMARTER, GREENER Points to Convey Big Data
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationMarcus Wilson, PharmD. First Plenary Session
Moderator First Plenary Session THE USE OF "BIG DATA" - WHERE ARE WE AND WHAT DOES THE FUTURE HOLD? Marcus Wilson, PharmD HealthCore Wilmington, DE, USA Speakers First Plenary Session THE USE OF "BIG DATA"
More informationModel selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013
Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationUsing 'Big Data' to Estimate Benefits and Harms of Healthcare Interventions
Using 'Big Data' to Estimate Benefits and Harms of Healthcare Interventions Experience with ICES and CNODES DAVID HENRY, PROFESSOR OF HEALTH SYSTEMS DATA, UNIVERSITY OF TORONTO, SENIOR SCIENTIST, INSTITUTE
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationPackage acrm. R topics documented: February 19, 2015
Package acrm February 19, 2015 Type Package Title Convenience functions for analytical Customer Relationship Management Version 0.1.1 Date 2014-03-28 Imports dummies, randomforest, kernelfactory, ada Author
More informationHow is Big Data Different? A Paradigm Shift
How is Big Data Different? A Paradigm Shift Jennifer Clarke, Ph.D. Associate Professor Department of Statistics Department of Food Science and Technology University of Nebraska Lincoln ASA Snake River
More informationCOMMON METHODOLOGICAL ISSUES FOR CER IN BIG DATA
COMMON METHODOLOGICAL ISSUES FOR CER IN BIG DATA Harvard Medical School and Harvard School of Public Health sharon@hcp.med.harvard.edu December 2013 1 / 16 OUTLINE UNCERTAINTY AND SELECTIVE INFERENCE 1
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationLeveraging electronic health records for predictive modeling of surgical complications. Grant Weller
Leveraging electronic health records for predictive modeling of surgical complications Grant Weller ISCB 2015 Utrecht NL August 26, 2015 Collaborators: David W. Larson, MD; Jenna Lovely, PharmD, RPh; Berton
More informationHow To Make A Credit Risk Model For A Bank Account
TRANSACTIONAL DATA MINING AT LLOYDS BANKING GROUP Csaba Főző csaba.fozo@lloydsbanking.com 15 October 2015 CONTENTS Introduction 04 Random Forest Methodology 06 Transactional Data Mining Project 17 Conclusions
More informationElectronic health records to study population health: opportunities and challenges
Electronic health records to study population health: opportunities and challenges Caroline A. Thompson, PhD, MPH Assistant Professor of Epidemiology San Diego State University Caroline.Thompson@mail.sdsu.edu
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationData Mining Methods: Applications for Institutional Research
Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014
More informationData Mining Techniques Chapter 6: Decision Trees
Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................
More informationEXECUTIVE SUMMARY...1 II.
EXTENDING COMPARATIVE EFFECTIVENESS RESEARCH AND MEDICAL PRODUCT SAFETY SURVEILLANCE CAPABILITY THROUGH LINKAGE OF ADMINISTRATIVE CLAIMS DATA WITH ELECTRONIC HEALTH RECORDS: A SENTINEL-PCORnet COLLABORATION
More informationClass #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris
Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines
More informationAn Overview of Data Mining: Predictive Modeling for IR in the 21 st Century
An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University AIRPO
More informationBOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING
BOOSTED REGRESSION TREES: A MODERN WAY TO ENHANCE ACTUARIAL MODELLING Xavier Conort xavier.conort@gear-analytics.com Session Number: TBR14 Insurance has always been a data business The industry has successfully
More informationPredictive Data modeling for health care: Comparative performance study of different prediction models
Predictive Data modeling for health care: Comparative performance study of different prediction models Shivanand Hiremath hiremat.nitie@gmail.com National Institute of Industrial Engineering (NITIE) Vihar
More informationUnderstanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification
Understanding Diagnosis Assignment from Billing Systems Relative to Electronic Health Records for Clinical Research Cohort Identification Russ Waitman Kelly Gerard Daniel W. Connolly Gregory A. Ator Division
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationL13: cross-validation
Resampling methods Cross validation Bootstrap L13: cross-validation Bias and variance estimation with the Bootstrap Three-way data partitioning CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More informationModel Validation Techniques
Model Validation Techniques Kevin Mahoney, FCAS kmahoney@ travelers.com CAS RPM Seminar March 17, 2010 Uses of Statistical Models in P/C Insurance Examples of Applications Determine expected loss cost
More informationICPSR Summer Program
ICPSR Summer Program Data Mining Tools for Exploring Big Data Department of Statistics Wharton School, University of Pennsylvania www-stat.wharton.upenn.edu/~stine Modern data mining combines familiar
More informationKnowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes
Knowledge Discovery and Data Mining Lecture 19 - Bagging Tom Kelsey School of Computer Science University of St Andrews http://tom.host.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey ID5059-19-B &
More informationModel Combination. 24 Novembre 2009
Model Combination 24 Novembre 2009 Datamining 1 2009-2010 Plan 1 Principles of model combination 2 Resampling methods Bagging Random Forests Boosting 3 Hybrid methods Stacking Generic algorithm for mulistrategy
More informationBenchmarking of different classes of models used for credit scoring
Benchmarking of different classes of models used for credit scoring We use this competition as an opportunity to compare the performance of different classes of predictive models. In particular we want
More informationHealthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw
Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Unit # 10 Sajjad Haider Fall 2012 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right
More informationMachine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University
Machine Learning Methods for Causal Effects Susan Athey, Stanford University Guido Imbens, Stanford University Introduction Supervised Machine Learning v. Econometrics/Statistics Lit. on Causality Supervised
More informationSmarter Healthcare@IBM Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research
Smarter Healthcare@IBM Research Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research Our researchers work on a wide spectrum of topics Basic Science Industry specific innovation Nanotechnology
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationSafety & Effectiveness of Drug Therapies for Type 2 Diabetes: Are pharmacoepi studies part of the problem, or part of the solution?
Safety & Effectiveness of Drug Therapies for Type 2 Diabetes: Are pharmacoepi studies part of the problem, or part of the solution? IDEG Training Workshop Melbourne, Australia November 29, 2013 Jeffrey
More informationPredictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD
Predictive Analytics Techniques: What to Use For Your Big Data March 26, 2014 Fern Halper, PhD Presenter Proven Performance Since 1995 TDWI helps business and IT professionals gain insight about data warehousing,
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationResearch Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources
Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources James Floyd, MD, MS Sep 17, 2015 UW Hospital Medicine Faculty Development Program Objectives Become more
More informationEnsemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05
Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification
More informationUsing multiple models: Bagging, Boosting, Ensembles, Forests
Using multiple models: Bagging, Boosting, Ensembles, Forests Bagging Combining predictions from multiple models Different models obtained from bootstrap samples of training data Average predictions or
More informationID: FDA-2015-N-2048-0001:
October 26, 2015 Division of Dockets Management (HFA-305) Food and Drug Administration 5630 Fishers Lane Rm. 1061 Rockville, MD 20852 Submitted electronically via regulations.gov Re: Docket ID: FDA-2015-N-2048-0001:
More informationEnsemble Learning Better Predictions Through Diversity. Todd Holloway ETech 2008
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008 Outline Building a classifier (a tutorial example) Neighbor method Major ideas and challenges in classification Ensembles
More informationCompetency 1 Describe the role of epidemiology in public health
The Northwest Center for Public Health Practice (NWCPHP) has developed competency-based epidemiology training materials for public health professionals in practice. Epidemiology is broadly accepted as
More informationIN THE CITY OF NEW YORK Decision Risk and Operations. Advanced Business Analytics Fall 2015
Advanced Business Analytics Fall 2015 Course Description Business Analytics is about information turning data into action. Its value derives fundamentally from information gaps in the economic choices
More informationComparing the Results of Support Vector Machines with Traditional Data Mining Algorithms
Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms Scott Pion and Lutz Hamel Abstract This paper presents the results of a series of analyses performed on direct mail
More informationCar Insurance. Havránek, Pokorný, Tomášek
Car Insurance Havránek, Pokorný, Tomášek Outline Data overview Horizontal approach + Decision tree/forests Vertical (column) approach + Neural networks SVM Data overview Customers Viewed policies Bought
More informationBig Data Analytics for Healthcare
Big Data Analytics for Healthcare Jimeng Sun Chandan K. Reddy Healthcare Analytics Department IBM TJ Watson Research Center Department of Computer Science Wayne State University 1 Healthcare Analytics
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
More informationBIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics http://www.ccmb.med.umich.edu/node/1376
Course Director: Dr. Kayvan Najarian (DCM&B, kayvan@umich.edu) Lectures: Labs: Mondays and Wednesdays 9:00 AM -10:30 AM Rm. 2065 Palmer Commons Bldg. Wednesdays 10:30 AM 11:30 AM (alternate weeks) Rm.
More informationData Mining Analysis (breast-cancer data)
Data Mining Analysis (breast-cancer data) Jung-Ying Wang Register number: D9115007, May, 2003 Abstract In this AI term project, we compare some world renowned machine learning tools. Including WEKA data
More informationData Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationUsing Predictive Analytics to Reduce COPD Readmissions
Using Predictive Analytics to Reduce COPD Readmissions Agenda Information about PinnacleHealth Today s Environment PinnacleHealth Case Study Questions? PinnacleHealth System Non-profit, community teaching
More informationFollowing are detailed competencies which are addressed to various extents in coursework, field training and the integrative project.
MPH Epidemiology Following are detailed competencies which are addressed to various extents in coursework, field training and the integrative project. Biostatistics Describe the roles biostatistics serves
More informationBiomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
More informationPredictive Modeling and Big Data
Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting Note to other teachers and users of these slides. Andrew would be delighted if ou found this source material useful in giving our own lectures.
More informationPredicting Student Persistence Using Data Mining and Statistical Analysis Methods
Predicting Student Persistence Using Data Mining and Statistical Analysis Methods Koji Fujiwara Office of Institutional Research and Effectiveness Bemidji State University & Northwest Technical College
More informationGeneralizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel
Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel Copyright 2008 All rights reserved. Random Forests Forest of decision
More informationMachine Learning Big Data using Map Reduce
Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories
More informationSpeaker Second Plenary Session ELECTRONIC HEALTH RECORDS FOR INFORMED HEALTH CARE IN ASIA- PACIFIC: LEARNING FROM EACH OTHER.
Speaker Second Plenary Session ELECTRONIC HEALTH RECORDS FOR INFORMED HEALTH CARE IN ASIA- PACIFIC: LEARNING FROM EACH OTHER Naoto Kume, PhD Kyoto University Kyoto Prefecture, Japan ISPOR 2014.09.08 11:15am-12:45pm
More informationData Mining Applications in Higher Education
Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2
More informationCar Insurance. Prvák, Tomi, Havri
Car Insurance Prvák, Tomi, Havri Sumo report - expectations Sumo report - reality Bc. Jan Tomášek Deeper look into data set Column approach Reminder What the hell is this competition about??? Attributes
More informationData mining is used to develop models for the early prediction of freshmen GPA. Since
1 USING DATA MINING TO PREDICT FRESHMEN OUTCOMES Nora Galambos, PhD Senior Data Scientist Office of Institutional Research, Planning & Effectiveness Stony Brook University Abstract Data mining is used
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationUse advanced techniques for summary and visualization of complex data for exploratory analysis and presentation.
MS Biostatistics MS Biostatistics Competencies Study Development: Work collaboratively with biomedical or public health researchers and PhD biostatisticians, as necessary, to provide biostatistical expertise
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
More informationCertified in Public Health (CPH) Exam CONTENT OUTLINE
NATIONAL BOARD OF PUBLIC HEALTH EXAMINERS Certified in Public Health (CPH) Exam CONTENT OUTLINE April 2014 INTRODUCTION This document was prepared by the National Board of Public Health Examiners for the
More informationSpeech and Network Marketing Model - A Review
Jastrzȩbia Góra, 16 th 20 th September 2013 APPLYING DATA MINING CLASSIFICATION TECHNIQUES TO SPEAKER IDENTIFICATION Kinga Sałapa 1,, Agata Trawińska 2 and Irena Roterman-Konieczna 1, 1 Department of Bioinformatics
More informationAutomatic Resolver Group Assignment of IT Service Desk Outsourcing
Automatic Resolver Group Assignment of IT Service Desk Outsourcing in Banking Business Padej Phomasakha Na Sakolnakorn*, Phayung Meesad ** and Gareth Clayton*** Abstract This paper proposes a framework
More informationMachine Learning Capacity and Performance Analysis and R
Machine Learning and R May 3, 11 30 25 15 10 5 25 15 10 5 30 25 15 10 5 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 0 2 4 6 8 101214161822 100 80 60 40 100 80 60 40 100 80 60 40 30 25 15 10 5 25 15 10
More informationJoseph M. Juran Center for Research in Supply Chain, Operations, and Quality
Joseph M. Juran Center for Research in Supply Chain, Operations, and Quality Professor Kevin Linderman Academic Co-Director of Joseph M. Juran Center for Research in Supply Chain, Operations, and Quality
More informationBig Challenges of Big Data - What are the statistical tasks for the precision medicine era?
Big Challenges of Big Data - What are the statistical tasks for the precision medicine era? Oct 18, 2015 Yu Shyr, Ph.D. Vanderbilt Center for Quantitative Sciences Highlights Overview of the BIG data in
More informationRole of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign
Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign Arun K Mandapaka, Amit Singh Kushwah, Dr.Goutam Chakraborty Oklahoma State University, OK, USA ABSTRACT Direct
More informationSAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care
PHEMI Health Systems Process Automation and Big Data Warehouse http://www.phemi.com SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care Bringing Privacy and Performance to Big
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationData mining and statistical models in marketing campaigns of BT Retail
Data mining and statistical models in marketing campaigns of BT Retail Francesco Vivarelli and Martyn Johnson Database Exploitation, Segmentation and Targeting group BT Retail Pp501 Holborn centre 120
More information304 Predictive Informatics: What Is Its Place in Healthcare?
close window ANNUAL CONFERENCE AND EXHIBITION APRIL 4-8, 2009 / CHICAGO www.himssconference.org View PowerPoint Presentation Print PowerPoint Presentation Roundtable 304 Predictive Informatics: What Is
More informationTreatment of Low Risk MDS. Overview. Myelodysplastic Syndromes (MDS)
Overview Amy Davidoff, Ph.D., M.S. Associate Professor Pharmaceutical Health Services Research Department, Peter Lamy Center on Drug Therapy and Aging University of Maryland School of Pharmacy Clinical
More informationTechnical Approaches for Protecting Privacy in the PCORnet Distributed Research Network V1.0
Technical Approaches for Protecting Privacy in the PCORnet Distributed Research Network V1.0 Guidance Document Prepared by: PCORnet Data Privacy Task Force Submitted to the PMO Approved by the PMO Submitted
More informationSOLUTION BRIEF. SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care
SOLUTION BRIEF SAP/PHEMI Big Data Warehouse and the Transformation to Value-Based Health Care Bringing Privacy and Performance to Big Data with SAP HANA and PHEMI Central Objectives Every healthcare organization
More informationNeural Networks for Sentiment Detection in Financial Text
Neural Networks for Sentiment Detection in Financial Text Caslav Bozic* and Detlef Seese* With a rise of algorithmic trading volume in recent years, the need for automatic analysis of financial news emerged.
More informationSupervised Learning (Big Data Analytics)
Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationOne Statistician s Perspectives on Statistics and "Big Data" Analytics
One Statistician s Perspectives on Statistics and "Big Data" Analytics Some (Ultimately Unsurprising) Lessons Learned Prof. Stephen Vardeman IMSE & Statistics Departments Iowa State University July 2014
More informationInsurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.
Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví Pavel Kříž Seminář z aktuárských věd MFF 4. dubna 2014 Summary 1. Application areas of Insurance Analytics 2. Insurance Analytics
More informationFDA s Sentinel Initiative: Current Status and Future Plans. Janet Woodcock M.D. Director, CDER, FDA
FDA s Sentinel Initiative: Current Status and Future Plans Janet Woodcock M.D. Director, CDER, FDA FDA Amendments Act of 2007 Section 905: Active Postmarket Risk Identification and Analysis Establish a
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationImputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data
Imputation Methods to Deal with Missing Values when Data Mining Trauma Injury Data Kay I Penny Centre for Mathematics and Statistics, Napier University, Craiglockhart Campus, Edinburgh, EH14 1DJ k.penny@napier.ac.uk
More informationWeather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract
More information