BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA

Size: px
Start display at page:

Download "BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA"

Transcription

1 BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA Harvard Medical School & Harvard School of Public Health October 14, / 7

2 THE SETTING Unprecedented advances in data acquisition technologies The omics technologies Imaging data Telecommunication data Social networking data Medical record data and registries Features of Big Data Number of variables (P) > number of people (N) Different data types & resolutions natural language processed, claims, laboratory Potential to grow Most focus on solutions to storing, indexing, querying, and accessing Big Data Less focus on statistical inference: turning data into knowledge 2 / 7

3 BIG ISSUE - 1 Selecting correct approach for confounding adjustment when re are many potential confounders Rarely know exact confounders required to satisfy no unmeasured confounding assumptions Rarely know identity of subgroups exhibiting heterogeneous treatment effects Level of uncertainty is: Substantially increased in big data settings Typically ignored in computations Require approaches to account for such uncertainties in making regulatory decisions 3 / 7

4 BIG ISSUE - 2 How much data pooling permitted for making safety and effectiveness decisions? All empirical studies pool information Survival analysis: event times are averaged or pooled across patients receiving device A and compared to pooled event times among device B patients Pooling across different units Pool information from different countries to learn about device effectiveness in a particular subpopulation Pool information from many different manufacturer devices to learn about a specific manufacturers device Require a clear understanding of oretical assumptions, an approach to quantify amount of pooling, and implications of pooling 4 / 7

5 BIG ISSUE - 2 (cont.) Pool information from many different manufacturer devices to learn about a specific manufacturers device i = 1, 2,, n j patients implanted with manufacturer j s device j = 1, 2,, J manufacturers of device y ji = mean outcome for patient i implanted with device j y ji N(α j, σ 2 y,j) and α j N(µ α, σ 2 α) For each manufacturer j, estimate of α j is ( No ) Pooling ˆα j = ω j µ α + (1 ω j )ȳ j 0 ω j = σ 2 y n j σ 2 α + σ2 y n j 1 ( ) Complete Pooling 5 / 7

6 BIG ISSUE - 3 The role of missing data in big data Risk of missing data is higher in big data Standard strategies for filling-in missing data have not been tested Multiple imputation Partially or completely missing variables Different missingness mechanisms Not collected in one registry vs patients too sick to have variable measured Mixture models for missingness mechanism Missing data strategies in big data settings require systematic study 6 / 7

7 OTHER BIG ISSUES New oretical underpinnings of asymptotic ory: Large p, small n: what happens when p goes to infinity faster than n? Large p, large n; what happens when p and n go to infinity at same rate? Dimensionality and sparsity issues - how to reduce dimensionality? Global sparsity: in genomics, expression levels for thousands of genes but only a handful are likely to be predictive of a specific phenotypic trait (LASSO methods) Local sparsity: partition of p-dimensional space such that, within each region, outcome depends upon a small number of p variables (regression trees) Mixture sparsity: data arises from several simple models (mixture models) How to measure strength of evidence? p-values are driven by sample size Bayes factors are a good solution 7 / 7

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Index. Registry Report

Index. Registry Report 2013.1-12 Registry Report 01 02 03 06 19 21 22 23 24 25 26 27 28 29 31 34 35 Index Registry Report 02 Registry Report Registry Report 03 04 Registry Report Registry Report 05 06 Registry Report Registry

More information

Electronic health records to study population health: opportunities and challenges

Electronic health records to study population health: opportunities and challenges Electronic health records to study population health: opportunities and challenges Caroline A. Thompson, PhD, MPH Assistant Professor of Epidemiology San Diego State University [email protected]

More information

BIG DATA AND HIGH DIMENSIONAL DATA ANALYSIS

BIG DATA AND HIGH DIMENSIONAL DATA ANALYSIS BIG DATA AND HIGH DIMENSIONAL DATA ANALYSIS B.L.S. Prakasa Rao CR RAO ADVANCED INSTITUTE OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE (AIMSCS) University of Hyderabad Campus GACHIBOWLI, HYDERABAD 500046

More information

Sample Size Designs to Assess Controls

Sample Size Designs to Assess Controls Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I

Gerard Mc Nulty Systems Optimisation Ltd gmcnulty@iol.ie/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Gerard Mc Nulty Systems Optimisation Ltd [email protected]/0876697867 BA.,B.A.I.,C.Eng.,F.I.E.I Data is Important because it: Helps in Corporate Aims Basis of Business Decisions Engineering Decisions Energy

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Targeted Learning with Big Data

Targeted Learning with Big Data Targeted Learning with Big Data Mark van der Laan UC Berkeley Center for Philosophy and History of Science Revisiting the Foundations of Statistics in the Era of Big Data: Scaling Up to Meet the Challenge

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

Fixed-Effect Versus Random-Effects Models

Fixed-Effect Versus Random-Effects Models CHAPTER 13 Fixed-Effect Versus Random-Effects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval

More information

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Statistical Challenges with Big Data in Management Science

Statistical Challenges with Big Data in Management Science Statistical Challenges with Big Data in Management Science Arnab Kumar Laha Indian Institute of Management Ahmedabad Analytics vs Reporting Competitive Advantage Reporting Prescriptive Analytics (Decision

More information

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS Susan Ellenberg, Ph.D. Perelman School of Medicine University of Pennsylvania School of Medicine FDA Clinical Investigator Course White Oak, MD November

More information

The PCORI Methodology Report. Appendix A: Methodology Standards

The PCORI Methodology Report. Appendix A: Methodology Standards The Appendix A: Methodology Standards November 2013 4 INTRODUCTION This page intentionally left blank. APPENDIX A A-1 APPENDIX A: PCORI METHODOLOGY STANDARDS Cross-Cutting Standards for PCOR 1: Standards

More information

Marketing Mix Modelling and Big Data P. M Cain

Marketing Mix Modelling and Big Data P. M Cain 1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

More information

Organizing Your Approach to a Data Analysis

Organizing Your Approach to a Data Analysis Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

More information

Childhood leukemia and EMF

Childhood leukemia and EMF Workshop on Sensitivity of Children to EMF Istanbul, Turkey June 2004 Childhood leukemia and EMF Leeka Kheifets Professor Incidence rate per 100,000 per year 9 8 7 6 5 4 3 2 1 0 Age-specific childhood

More information

Dealing with Missing Data

Dealing with Missing Data Dealing with Missing Data Roch Giorgi email: [email protected] UMR 912 SESSTIM, Aix Marseille Université / INSERM / IRD, Marseille, France BioSTIC, APHM, Hôpital Timone, Marseille, France January

More information

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out

Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Challenges in Longitudinal Data Analysis: Baseline Adjustment, Missing Data, and Drop-out Sandra Taylor, Ph.D. IDDRC BBRD Core 23 April 2014 Objectives Baseline Adjustment Introduce approaches Guidance

More information

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will

More information

Financial Time Series Analysis (FTSA) Lecture 1: Introduction

Financial Time Series Analysis (FTSA) Lecture 1: Introduction Financial Time Series Analysis (FTSA) Lecture 1: Introduction Brief History of Time Series Analysis Statistical analysis of time series data (Yule, 1927) v/s forecasting (even longer). Forecasting is often

More information

Advances in Loss Data Analytics: What We Have Learned at ORX

Advances in Loss Data Analytics: What We Have Learned at ORX Advances in Loss Data Analytics: What We Have Learned at ORX Federal Reserve Bank of Boston: New Challenges For Operational Risk Measurement and Management May 14, 2008 Regulatory and Management Context

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Employers costs for total benefits grew

Employers costs for total benefits grew Costs Benefit Costs Comparing benefit costs for full- and part-time workers Health insurance appears to be the only benefit representing a true quasi-fixed cost to employers, meaning that the cost per

More information

U.S. Army Research, Development and Engineering Command. Cyber Security CRA Overview

U.S. Army Research, Development and Engineering Command. Cyber Security CRA Overview U.S. Army Research, Development and Engineering Command Cyber Security CRA Overview Dr. Ananthram Swami, ST Network Science 18FEB 2014 Cyber Security Collaborative Research Alliance A Collaborative Venture

More information

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov Search and Data Mining: Techniques Applications Anya Yarygina Boris Novikov Introduction Data mining applications Data mining system products and research prototypes Additional themes on data mining Social

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

Towards running complex models on big data

Towards running complex models on big data Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation

More information

Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov brb.nci.nih.gov Powerpoint presentations

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

CSC 342 Semester I: 1425-1426H (2004-2005 G)

CSC 342 Semester I: 1425-1426H (2004-2005 G) CSC 342 Semester I: 1425-1426H (2004-2005 G) Software Engineering Systems Analysis: Requirements Structuring Context & DFDs. Instructor: Dr. Ghazy Assassa Software Engineering CSC 342/Dr. Ghazy Assassa

More information

The primary goal of this thesis was to understand how the spatial dependence of

The primary goal of this thesis was to understand how the spatial dependence of 5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA

PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA PREDICTIVE ANALYTICS: PROVIDING NOVEL APPROACHES TO ENHANCE OUTCOMES RESEARCH LEVERAGING BIG AND COMPLEX DATA IMS Symposium at ISPOR at Montreal June 2 nd, 2014 Agenda Topic Presenter Time Introduction:

More information

Challenges, Tools and Examples for Big Data Inference

Challenges, Tools and Examples for Big Data Inference Challenges, Tools and Examples for Big Data Inference Jean-François Plante, HEC Montréal Closing Conference: Statistical and Computational Analytics for Big Data June 12 th, 2015 What is Big Data? Dan

More information

Missing data and net survival analysis Bernard Rachet

Missing data and net survival analysis Bernard Rachet Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,

More information

Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015

Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015 Presenting data: how to convey information most effectively Centre of Research Excellence in Patient Safety 20 Feb 2015 Biomedical Informatics: helping visualization from molecules to population Dr. Guillermo

More information

The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial

The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial The Consequences of Missing Data in the ATLAS ACS 2-TIMI 51 Trial In this white paper, we will explore the consequences of missing data in the ATLAS ACS 2-TIMI 51 Trial and consider if an alternative approach

More information

DATA MINING IN FINANCE

DATA MINING IN FINANCE DATA MINING IN FINANCE Advances in Relational and Hybrid Methods by BORIS KOVALERCHUK Central Washington University, USA and EVGENII VITYAEV Institute of Mathematics Russian Academy of Sciences, Russia

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

A Proven Approach to Stress Testing Consumer Loan Portfolios

A Proven Approach to Stress Testing Consumer Loan Portfolios A Proven Approach to Stress Testing Consumer Loan Portfolios Interthinx, Inc. 2013. All rights reserved. Interthinx is a registered trademark of Verisk Analytics. No part of this publication may be reproduced,

More information

Determining Measurement Uncertainty for Dimensional Measurements

Determining Measurement Uncertainty for Dimensional Measurements Determining Measurement Uncertainty for Dimensional Measurements The purpose of any measurement activity is to determine or quantify the size, location or amount of an object, substance or physical parameter

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University

Machine Learning Methods for Causal Effects. Susan Athey, Stanford University Guido Imbens, Stanford University Machine Learning Methods for Causal Effects Susan Athey, Stanford University Guido Imbens, Stanford University Introduction Supervised Machine Learning v. Econometrics/Statistics Lit. on Causality Supervised

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Introduction to nonparametric regression: Least squares vs. Nearest neighbors

Introduction to nonparametric regression: Least squares vs. Nearest neighbors Introduction to nonparametric regression: Least squares vs. Nearest neighbors Patrick Breheny October 30 Patrick Breheny STA 621: Nonparametric Statistics 1/16 Introduction For the remainder of the course,

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 4.4 Homework

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 4.4 Homework Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 4.4 Homework 4.65 You buy a hot stock for $1000. The stock either gains 30% or loses 25% each day, each with probability.

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

Optimal and Worst-Case Performance of Mastery Learning Assessment with Bayesian Knowledge Tracing

Optimal and Worst-Case Performance of Mastery Learning Assessment with Bayesian Knowledge Tracing Optimal and Worst-Case Performance of Mastery Learning Assessment with Bayesian Knowledge Tracing Stephen E. Fancsali, Tristan Nixon, and Steven Ritter Carnegie Learning, Inc. 437 Grant Street, Suite 918

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex

More information

Principles of Systematic Review: Focus on Alcoholism Treatment

Principles of Systematic Review: Focus on Alcoholism Treatment Principles of Systematic Review: Focus on Alcoholism Treatment Manit Srisurapanont, M.D. Professor of Psychiatry Department of Psychiatry, Faculty of Medicine, Chiang Mai University For Symposium 1A: Systematic

More information

Bootstrapping Big Data

Bootstrapping Big Data Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

More information

Big Data: a new era for Statistics

Big Data: a new era for Statistics Big Data: a new era for Statistics Richard J. Samworth Abstract Richard Samworth (1996) is a Professor of Statistics in the University s Statistical Laboratory, and has been a Fellow of St John s since

More information

Sensitivity Analysis in Multiple Imputation for Missing Data

Sensitivity Analysis in Multiple Imputation for Missing Data Paper SAS270-2014 Sensitivity Analysis in Multiple Imputation for Missing Data Yang Yuan, SAS Institute Inc. ABSTRACT Multiple imputation, a popular strategy for dealing with missing values, usually assumes

More information

Optimization applications in finance, securities, banking and insurance

Optimization applications in finance, securities, banking and insurance IBM Software IBM ILOG Optimization and Analytical Decision Support Solutions White Paper Optimization applications in finance, securities, banking and insurance 2 Optimization applications in finance,

More information

Guideline on missing data in confirmatory clinical trials

Guideline on missing data in confirmatory clinical trials 2 July 2010 EMA/CPMP/EWP/1776/99 Rev. 1 Committee for Medicinal Products for Human Use (CHMP) Guideline on missing data in confirmatory clinical trials Discussion in the Efficacy Working Party June 1999/

More information

Chapter 8: Quantitative Sampling

Chapter 8: Quantitative Sampling Chapter 8: Quantitative Sampling I. Introduction to Sampling a. The primary goal of sampling is to get a representative sample, or a small collection of units or cases from a much larger collection or

More information

Handling missing data in Stata a whirlwind tour

Handling missing data in Stata a whirlwind tour Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled

More information

Stock Market Liquidity and the Business Cycle

Stock Market Liquidity and the Business Cycle Stock Market Liquidity and the Business Cycle Forthcoming, Journal of Finance Randi Næs a Johannes Skjeltorp b Bernt Arne Ødegaard b,c Jun 2010 a: Ministry of Trade and Industry b: Norges Bank c: University

More information

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal Learning Example Chapter 18: Learning from Examples 22c:145 An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc) of newly admitted patients. A decision is needed: whether

More information

Clinical Research Infrastructure

Clinical Research Infrastructure Clinical Research Infrastructure Enhancing UK s Clinical Research Capabilities & Technologies At least 150m to establish /develop cutting-edge technological infrastructure, UK wide. to bring into practice

More information

M.Sc. Health Economics and Health Care Management

M.Sc. Health Economics and Health Care Management List of Courses M.Sc. Health Economics and Health Care Management METHODS... 2 QUANTITATIVE METHODS... 2 ADVANCED ECONOMETRICS... 3 MICROECONOMICS... 4 DECISION THEORY... 5 INTRODUCTION TO CSR: FUNDAMENTALS

More information

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d. EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models

More information

Qualitative and Quantitative Assessment of Uncertainty in Regulatory Decision Making. Charles F. Manski

Qualitative and Quantitative Assessment of Uncertainty in Regulatory Decision Making. Charles F. Manski Qualitative and Quantitative Assessment of Uncertainty in Regulatory Decision Making Charles F. Manski Department of Economics and Institute for Policy Research Northwestern University Legal analysis of

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection

Adequacy of Biomath. Models. Empirical Modeling Tools. Bayesian Modeling. Model Uncertainty / Selection Directions in Statistical Methodology for Multivariable Predictive Modeling Frank E Harrell Jr University of Virginia Seattle WA 19May98 Overview of Modeling Process Model selection Regression shape Diagnostics

More information

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

A Bayesian hierarchical surrogate outcome model for multiple sclerosis A Bayesian hierarchical surrogate outcome model for multiple sclerosis 3 rd Annual ASA New Jersey Chapter / Bayer Statistics Workshop David Ohlssen (Novartis), Luca Pozzi and Heinz Schmidli (Novartis)

More information

Big Data, Statistics, and the Internet

Big Data, Statistics, and the Internet Big Data, Statistics, and the Internet Steven L. Scott April, 4 Steve Scott (Google) Big Data, Statistics, and the Internet April, 4 / 39 Summary Big data live on more than one machine. Computing takes

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34

Network Analysis. BCH 5101: Analysis of -Omics Data 1/34 Network Analysis BCH 5101: Analysis of -Omics Data 1/34 Network Analysis Graphs as a representation of networks Examples of genome-scale graphs Statistical properties of genome-scale graphs The search

More information

Evaluating Current Practices in Shelf Life Estimation

Evaluating Current Practices in Shelf Life Estimation Definition of Evaluating Current Practices in Estimation PQRI Stability Working Group Pat Forenzo Novartis James Schwenke Applied Research Consultants, LLC From ICH Q1E An appropriate approach to retest

More information

Understanding Media Asset Management A Plain English Guide for Printing Communications Professionals

Understanding Media Asset Management A Plain English Guide for Printing Communications Professionals Understanding Media Asset Management A Plain English Guide for Printing Communications Professionals Interest in Media Asset Management is growing dramatically. A growing number of software and service

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Establishing the Scope for The Business Case Structure to Evaluate Advanced Metering

Establishing the Scope for The Business Case Structure to Evaluate Advanced Metering Establishing the Scope for The Business Case Structure to Evaluate Advanced Metering What factors should be considered when determining whether to invest in an advanced metering system? How can a business

More information

Case Study Call Centre Hypothesis Testing

Case Study Call Centre Hypothesis Testing is often thought of as an advanced Six Sigma tool but it is a very useful technique with many applications and in many cases it can be quite simple to use. Hypothesis tests are used to make comparisons

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Section 6: Model Selection, Logistic Regression and more...

Section 6: Model Selection, Logistic Regression and more... Section 6: Model Selection, Logistic Regression and more... Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Model Building

More information

Confirmation Bias as a Human Aspect in Software Engineering

Confirmation Bias as a Human Aspect in Software Engineering Confirmation Bias as a Human Aspect in Software Engineering Gul Calikli, PhD Data Science Laboratory, Department of Mechanical and Industrial Engineering, Ryerson University Why Human Aspects in Software

More information

Big Data An Opportunity or a Distraction? Signal or Noise?

Big Data An Opportunity or a Distraction? Signal or Noise? Big Data An Opportunity or a Distraction? Signal or Noise? Maya R. Said, Sc.D. SVP & Global Head, Oncology Policy & Market Access, Novartis 3rd International Systems Biomedicine Symposium Luxembourg, 28

More information

Managing Portfolios of DSM Resources and Reducing Regulatory Risks: A Case Study of Nevada

Managing Portfolios of DSM Resources and Reducing Regulatory Risks: A Case Study of Nevada Managing Portfolios of DSM Resources and Reducing Regulatory Risks: A Case Study of Nevada Hossein Haeri, Lauren Miller Gage, and Amy Green, Quantec, LLC Larry Holmes, Nevada Power Company/Sierra Pacific

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

270107 - MD - Data Mining

270107 - MD - Data Mining Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of

More information

Analysis and Design of Software Systems Practical Session 01. System Layering

Analysis and Design of Software Systems Practical Session 01. System Layering Analysis and Design of Software Systems Practical Session 01 System Layering Outline Course Overview Course Objectives Computer Science vs. Software Engineering Layered Architectures Selected topics in

More information

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Course 803401 DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Oman College of Management and Technology Course 803401 DSS Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization CS/MIS Department Information Sharing

More information