Evidence to Action: Use of Predictive Models for Beach Water Postings

Size: px
Start display at page:

Download "Evidence to Action: Use of Predictive Models for Beach Water Postings"

Transcription

1 Evidence to Action: Use of Predictive Models for Beach Water Postings Canadian Society for Epidemiology and Biostatistics Caitlyn Paget, June 4 th 2015

2 Goal is to improve program delivery Can we improve the accuracy of our beach postings by predicting the water quality based on beach inspection data? As applied research, project had some constraints: Using an administrative dataset, limited to pre-existing variables Follow modeling methods suggested by the USEPA Linear regression with log-transformed E.coli as response variable Using Virtual Beach, a customized regression software

3 OPHS mandates weekly inspections 100 E.coli / 100ml > 100 E.coli / 100ml Beach Open Beach Posted

4 E. coli concentration [count / 100mL water] E.coli spikes are often transient Wk of 10-Jun Wk of 17-Jun Wk of 24-Jun Wk of 01-Jul Wk of 08-Jul Wk of 15-Jul Wk of 22-Jul Wk of 29-Jul Wk of 05-Aug Wk of 12-Aug Wk of 19-Aug Wk of 26-Aug

5 York has a relatively rich data set Daily beach inspections 5 or 6 water samples per beach Record environmental conditions Weather data from two stations Improvements over time: 2006: Standardized observations 2008: Purchased turbidometers 2011: Enhanced sampling frequency 2012: Purchased UV & wind meters Expanded enhanced sampling

6 Best subsets modeling technique Explored variable transformation Decision threshold of at least 40% improvement in Pearson coefficient Reduced set of variables under consideration Generated sample models using repeated genetic algorithms Compared model accuracy at different E.coli thresholds Generated candidate models Tried all variable combinations up to a maximum of 10 Optimized for predicted residual sum of squares Cross-validation for final model selection Random cross-validation: 1000 iterations setting aside 20-25% of data Simulated results for each beach season

7 Build a specific model for each beach

8 Overall pilot results are positive Beach 2012 Results 2013 Results 2014 Results #1 Significantly better on all measures Moderately better on all measures Moderately better on all measures #2 Fewer missed postings Same accuracy, more missed postings Fewer unnecessary postings #3 Fewer missed postings No difference Moderately better on all measures #4 No difference #5 No difference Fewer unnecessary postings More missed postings, fewer unnecessary ones #6 Fewer missed postings, more unnecessary ones Fewer missed postings

9 Model updates part of ongoing work All new models No changes at midseason check-in Updated either model or betas based on prior model performance No changes at midseason check-in 2014 Updated models for 4 beaches Beach Model Update compared to Betas Update Compared updated models with background updated beta models Updated model again midseason for 1 beach, improved performance #1 #2 #4 Significantly better on all measures Fewer missed postings, more unnecessary ones Better accuracy, fewer unnecessary postings #5 No difference

10 Complex program to communicate Predictive modeling is more challenging for risk communication Persistence system is more intuitive Lab results are gold standard, except for the 24 hour delay in results Ideally communication is simple No changes needed to the Postings Signs

11 Success leads to more work The pilot showed that it is possible to improve the accuracy of beach inspection data, using beach-inspection-based predictive models. However, we d like to see better consistency in model performance for all years & beaches Improve modeling methods: perform cross-validation for all possible variable combinations to determine candidate models based on better performance simulations. (Requires leaving Virtual Beach software.) Continue to improve data quality.

12 Acknowledgements Debeka Navaranjan, Epidemiologist Bernie Mayer, Safe Water Manager Beach Water Public Health Inspector Students Questions?

Cross Validation. Dr. Thomas Jensen Expedia.com

Cross Validation. Dr. Thomas Jensen Expedia.com Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Ensemble learning Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 8 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Combining multiple models Bagging The basic idea

More information

How To Bet On An Nfl Football Game With A Machine Learning Program

How To Bet On An Nfl Football Game With A Machine Learning Program Beating the NFL Football Point Spread Kevin Gimpel kgimpel@cs.cmu.edu 1 Introduction Sports betting features a unique market structure that, while rather different from financial markets, still boasts

More information

Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

More information

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model

Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is

More information

Industry Environment and Concepts for Forecasting 1

Industry Environment and Concepts for Forecasting 1 Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Data driven approach in analyzing energy consumption data in buildings. Office of Environmental Sustainability Ian Tan

Data driven approach in analyzing energy consumption data in buildings. Office of Environmental Sustainability Ian Tan Data driven approach in analyzing energy consumption data in buildings Office of Environmental Sustainability Ian Tan Background Real time energy consumption data of buildings in terms of electricity (kwh)

More information

APPENDIX 15. Review of demand and energy forecasting methodologies Frontier Economics

APPENDIX 15. Review of demand and energy forecasting methodologies Frontier Economics APPENDIX 15 Review of demand and energy forecasting methodologies Frontier Economics Energex regulatory proposal October 2014 Assessment of Energex s energy consumption and system demand forecasting procedures

More information

College Readiness LINKING STUDY

College Readiness LINKING STUDY College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)

More information

The Operational Value of Social Media Information. Social Media and Customer Interaction

The Operational Value of Social Media Information. Social Media and Customer Interaction The Operational Value of Social Media Information Dennis J. Zhang (Kellogg School of Management) Ruomeng Cui (Kelley School of Business) Santiago Gallino (Tuck School of Business) Antonio Moreno-Garcia

More information

ENERGY STAR for Data Centers

ENERGY STAR for Data Centers ENERGY STAR for Data Centers Alexandra Sullivan US EPA, ENERGY STAR February 4, 2010 Agenda ENERGY STAR Buildings Overview Energy Performance Ratings Portfolio Manager Data Center Initiative Objective

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Decision Trees from large Databases: SLIQ

Decision Trees from large Databases: SLIQ Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values

More information

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Determining Minimum Sample Sizes for Estimating Prediction Equations for College Freshman Grade Average

Determining Minimum Sample Sizes for Estimating Prediction Equations for College Freshman Grade Average A C T Research Report Series 87-4 Determining Minimum Sample Sizes for Estimating Prediction Equations for College Freshman Grade Average Richard Sawyer March 1987 For additional copies write: ACT Research

More information

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Bathing Water Directive report 2013. Denmark

Bathing Water Directive report 2013. Denmark Bathing Water Directive report 2013 Denmark The report gives a general overview of information acquired from the reported data, based on provisions of the Bathing Water Directive 1. The reporting process

More information

Short-Term Forecasting in Retail Energy Markets

Short-Term Forecasting in Retail Energy Markets Itron White Paper Energy Forecasting Short-Term Forecasting in Retail Energy Markets Frank A. Monforte, Ph.D Director, Itron Forecasting 2006, Itron Inc. All rights reserved. 1 Introduction 4 Forecasting

More information

Improving Demand Forecasting

Improving Demand Forecasting Improving Demand Forecasting 2 nd July 2013 John Tansley - CACI Overview The ideal forecasting process: Efficiency, transparency, accuracy Managing and understanding uncertainty: Limits to forecast accuracy,

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

Data Mining Methods: Applications for Institutional Research

Data Mining Methods: Applications for Institutional Research Data Mining Methods: Applications for Institutional Research Nora Galambos, PhD Office of Institutional Research, Planning & Effectiveness Stony Brook University NEAIR Annual Conference Philadelphia 2014

More information

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di

More information

Robust procedures for Canadian Test Day Model final report for the Holstein breed

Robust procedures for Canadian Test Day Model final report for the Holstein breed Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction

More information

Power Monitoring for Modern Data Centers

Power Monitoring for Modern Data Centers Power Monitoring for Modern Data Centers December 2010/AT304 by Reza Tajali, P.E. Square D Power Systems Engineering Make the most of your energy SM Revision #1 12/10 By their nature, mission critical

More information

Smoothing methods. Marzena Narodzonek-Karpowska. Prof. Dr. W. Toporowski Institut für Marketing & Handel Abteilung Handel

Smoothing methods. Marzena Narodzonek-Karpowska. Prof. Dr. W. Toporowski Institut für Marketing & Handel Abteilung Handel Smoothing methods Marzena Narodzonek-Karpowska Prof. Dr. W. Toporowski Institut für Marketing & Handel Abteilung Handel What Is Forecasting? Process of predicting a future event Underlying basis of all

More information

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com A Regime-Switching Model for Electricity Spot Prices Gero Schindlmayr EnBW Trading GmbH g.schindlmayr@enbw.com May 31, 25 A Regime-Switching Model for Electricity Spot Prices Abstract Electricity markets

More information

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS

FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS FRAUD DETECTION IN ELECTRIC POWER DISTRIBUTION NETWORKS USING AN ANN-BASED KNOWLEDGE-DISCOVERY PROCESS Breno C. Costa, Bruno. L. A. Alberto, André M. Portela, W. Maduro, Esdras O. Eler PDITec, Belo Horizonte,

More information

Forecasting the first step in planning. Estimating the future demand for products and services and the necessary resources to produce these outputs

Forecasting the first step in planning. Estimating the future demand for products and services and the necessary resources to produce these outputs PRODUCTION PLANNING AND CONTROL CHAPTER 2: FORECASTING Forecasting the first step in planning. Estimating the future demand for products and services and the necessary resources to produce these outputs

More information

Pearson's Correlation Tests

Pearson's Correlation Tests Chapter 800 Pearson's Correlation Tests Introduction The correlation coefficient, ρ (rho), is a popular statistic for describing the strength of the relationship between two variables. The correlation

More information

MONTE CARLO SIMULATION FOR INSURANCE AGENCY CONTINGENT COMMISSION

MONTE CARLO SIMULATION FOR INSURANCE AGENCY CONTINGENT COMMISSION Proceedings of the 2013 Winter Simulation Conference R. Pasupathy, S.-H. Kim, A. Tolk, R. Hill, and M. E. Kuhl, eds MONTE CARLO SIMULATION FOR INSURANCE AGENCY CONTINGENT COMMISSION Mark Grabau Advanced

More information

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish

Demand forecasting & Aggregate planning in a Supply chain. Session Speaker Prof.P.S.Satish Demand forecasting & Aggregate planning in a Supply chain Session Speaker Prof.P.S.Satish 1 Introduction PEMP-EMM2506 Forecasting provides an estimate of future demand Factors that influence demand and

More information

2016 ERCOT System Planning Long-Term Hourly Peak Demand and Energy Forecast December 31, 2015

2016 ERCOT System Planning Long-Term Hourly Peak Demand and Energy Forecast December 31, 2015 2016 ERCOT System Planning Long-Term Hourly Peak Demand and Energy Forecast December 31, 2015 2015 Electric Reliability Council of Texas, Inc. All rights reserved. Long-Term Hourly Peak Demand and Energy

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Ch.3 Demand Forecasting.

Ch.3 Demand Forecasting. Part 3 : Acquisition & Production Support. Ch.3 Demand Forecasting. Edited by Dr. Seung Hyun Lee (Ph.D., CPL) IEMS Research Center, E-mail : lkangsan@iems.co.kr Demand Forecasting. Definition. An estimate

More information

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Prediction Model for Crude Oil Price Using Artificial Neural Networks Applied Mathematical Sciences, Vol. 8, 2014, no. 80, 3953-3965 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.43193 Prediction Model for Crude Oil Price Using Artificial Neural Networks

More information

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement

Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Sentiment Analysis of Twitter Feeds for the Prediction of Stock Market Movement Ray Chen, Marius Lazer Abstract In this paper, we investigate the relationship between Twitter feed content and stock market

More information

How To Manage Project Management

How To Manage Project Management CS/SWE 321 Sections -001 & -003 Software Project Management Copyright 2014 Hassan Gomaa All rights reserved. No part of this document may be reproduced in any form or by any means, without the prior written

More information

Beating the MLB Moneyline

Beating the MLB Moneyline Beating the MLB Moneyline Leland Chen llxchen@stanford.edu Andrew He andu@stanford.edu 1 Abstract Sports forecasting is a challenging task that has similarities to stock market prediction, requiring time-series

More information

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap

We discuss 2 resampling methods in this chapter - cross-validation - the bootstrap Statistical Learning: Chapter 5 Resampling methods (Cross-validation and bootstrap) (Note: prior to these notes, we'll discuss a modification of an earlier train/test experiment from Ch 2) We discuss 2

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Regression Clustering

Regression Clustering Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm

More information

Latent Class Regression Part II

Latent Class Regression Part II This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS

OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS OBJECTIVE ASSESSMENT OF FORECASTING ASSIGNMENTS USING SOME FUNCTION OF PREDICTION ERRORS CLARKE, Stephen R. Swinburne University of Technology Australia One way of examining forecasting methods via assignments

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

Big Data Techniques Applied to Very Short-term Wind Power Forecasting

Big Data Techniques Applied to Very Short-term Wind Power Forecasting Big Data Techniques Applied to Very Short-term Wind Power Forecasting Ricardo Bessa Senior Researcher (ricardo.j.bessa@inesctec.pt) Center for Power and Energy Systems, INESC TEC, Portugal Joint work with

More information

Predictive Modeling and Big Data

Predictive Modeling and Big Data Predictive Modeling and Presented by Eileen Burns, FSA, MAAA Milliman Agenda Current uses of predictive modeling in the life insurance industry Potential applications of 2 1 June 16, 2014 [Enter presentation

More information

Some Examples of (Markov Chain) Monte Carlo Methods

Some Examples of (Markov Chain) Monte Carlo Methods Some Examples of (Markov Chain) Monte Carlo Methods Ryan R. Rosario What is a Monte Carlo method? Monte Carlo methods rely on repeated sampling to get some computational result. Monte Carlo methods originated

More information

Forest Fire Information System (EFFIS): Rapid Damage Assessment

Forest Fire Information System (EFFIS): Rapid Damage Assessment Forest Fire Information System (EFFIS): Fire Danger D Rating Rapid Damage Assessment G. Amatulli, A. Camia, P. Barbosa, J. San-Miguel-Ayanz OUTLINE 1. Introduction: what is the JRC 2. What is EFFIS 3.

More information

Data Mining Introduction

Data Mining Introduction Data Mining Introduction Bob Stine Dept of Statistics, School University of Pennsylvania www-stat.wharton.upenn.edu/~stine What is data mining? An insult? Predictive modeling Large, wide data sets, often

More information

Module 6: Introduction to Time Series Forecasting

Module 6: Introduction to Time Series Forecasting Using Statistical Data to Make Decisions Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and

More information

Characterizing Task Usage Shapes in Google s Compute Clusters

Characterizing Task Usage Shapes in Google s Compute Clusters Characterizing Task Usage Shapes in Google s Compute Clusters Qi Zhang 1, Joseph L. Hellerstein 2, Raouf Boutaba 1 1 University of Waterloo, 2 Google Inc. Introduction Cloud computing is becoming a key

More information

Using Web-based Software for Irrigation and Nitrogen Management in Onion Production: our Research Plan for 2013

Using Web-based Software for Irrigation and Nitrogen Management in Onion Production: our Research Plan for 2013 Using Web-based Software for Irrigation and Nitrogen Management in Onion Production: our Research Plan for 2013 Andre Biscaro, Farm Advisor UCCE Los Angeles County Michael Cahn, Farm Advisor UCCE Monterey

More information

Exponential Smoothing with Trend. As we move toward medium-range forecasts, trend becomes more important.

Exponential Smoothing with Trend. As we move toward medium-range forecasts, trend becomes more important. Exponential Smoothing with Trend As we move toward medium-range forecasts, trend becomes more important. Incorporating a trend component into exponentially smoothed forecasts is called double exponential

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Predictive Analytics in Workplace Safety:

Predictive Analytics in Workplace Safety: Predictive Analytics in Workplace Safety: Four Safety Truths that Reduce Workplace Injuries A Predictive Solutions White Paper Many industries and business functions are taking advantage of their big data

More information

Forecasting methods applied to engineering management

Forecasting methods applied to engineering management Forecasting methods applied to engineering management Áron Szász-Gábor Abstract. This paper presents arguments for the usefulness of a simple forecasting application package for sustaining operational

More information

FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT. Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly

FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT. Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly FINDING SUBGROUPS OF ENHANCED TREATMENT EFFECT Jeremy M G Taylor Jared Foster University of Michigan Steve Ruberg Eli Lilly 1 1. INTRODUCTION and MOTIVATION 2. PROPOSED METHOD Random Forests Classification

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

System Identification for Acoustic Comms.:

System Identification for Acoustic Comms.: System Identification for Acoustic Comms.: New Insights and Approaches for Tracking Sparse and Rapidly Fluctuating Channels Weichang Li and James Preisig Woods Hole Oceanographic Institution The demodulation

More information

Storm Prediction in a Cloud. Ian Davis, Hadi Hemmati, Ric Holt, Mike Godfrey Douglas Neuse, Serge Mankovskii

Storm Prediction in a Cloud. Ian Davis, Hadi Hemmati, Ric Holt, Mike Godfrey Douglas Neuse, Serge Mankovskii Storm Prediction in a Cloud Ian Davis, Hadi Hemmati, Ric Holt, Mike Godfrey Douglas Neuse, Serge Mankovskii Load Balancing in Clouds The goal / balancing act: Want to maximise delivery of cloud services

More information

Lecture 10: Regression Trees

Lecture 10: Regression Trees Lecture 10: Regression Trees 36-350: Data Mining October 11, 2006 Reading: Textbook, sections 5.2 and 10.5. The next three lectures are going to be about a particular kind of nonlinear predictive model,

More information

Supervised Learning (Big Data Analytics)

Supervised Learning (Big Data Analytics) Supervised Learning (Big Data Analytics) Vibhav Gogate Department of Computer Science The University of Texas at Dallas Practical advice Goal of Big Data Analytics Uncover patterns in Data. Can be used

More information

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Session 9 Case 3: Utilizing Available Software Statistical Analysis Session 9 Case 3: Utilizing Available Software Statistical Analysis Michelle Phillips Economist, PURC michelle.phillips@warrington.ufl.edu With material from Ted Kury Session Overview With Data from Cases

More information

State Unemployment Rate Nowcasts * Elif Sen October 2010

State Unemployment Rate Nowcasts * Elif Sen October 2010 State Unemployment Rate Nowcasts * Elif Sen October 2010 For the month of August, the unemployment rates for Pennsylvania, New Jersey, and Delaware were 9.2 percent, 9.6 percent, and 8.4 percent, respectively,

More information

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca Data Mining in CRM & Direct Marketing Jun Du The University of Western Ontario jdu43@uwo.ca Outline Why CRM & Marketing Goals in CRM & Marketing Models and Methodologies Case Study: Response Model Case

More information

Methodology For Illinois Electric Customers and Sales Forecasts: 2016-2025

Methodology For Illinois Electric Customers and Sales Forecasts: 2016-2025 Methodology For Illinois Electric Customers and Sales Forecasts: 2016-2025 In December 2014, an electric rate case was finalized in MEC s Illinois service territory. As a result of the implementation of

More information

Statistical Learning Theory Meets Big Data

Statistical Learning Theory Meets Big Data Statistical Learning Theory Meets Big Data Randomized algorithms for frequent itemsets Eli Upfal Brown University Data, data, data In God we trust, all others (must) bring data Prof. W.E. Deming, Statistician,

More information

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013 A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Logistic Regression for Spam Filtering

Logistic Regression for Spam Filtering Logistic Regression for Spam Filtering Nikhila Arkalgud February 14, 28 Abstract The goal of the spam filtering problem is to identify an email as a spam or not spam. One of the classic techniques used

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Applying Data Science to Sales Pipelines for Fun and Profit

Applying Data Science to Sales Pipelines for Fun and Profit Applying Data Science to Sales Pipelines for Fun and Profit Andy Twigg, CTO, C9 @lambdatwigg Abstract Machine learning is now routinely applied to many areas of industry. At C9, we apply machine learning

More information

Indian School of Business Forecasting Sales for Dairy Products

Indian School of Business Forecasting Sales for Dairy Products Indian School of Business Forecasting Sales for Dairy Products Contents EXECUTIVE SUMMARY... 3 Data Analysis... 3 Forecast Horizon:... 4 Forecasting Models:... 4 Fresh milk - AmulTaaza (500 ml)... 4 Dahi/

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Lin s Concordance Correlation Coefficient

Lin s Concordance Correlation Coefficient NSS Statistical Software NSS.com hapter 30 Lin s oncordance orrelation oefficient Introduction This procedure calculates Lin s concordance correlation coefficient ( ) from a set of bivariate data. The

More information

Likelihood Approaches for Trial Designs in Early Phase Oncology

Likelihood Approaches for Trial Designs in Early Phase Oncology Likelihood Approaches for Trial Designs in Early Phase Oncology Clinical Trials Elizabeth Garrett-Mayer, PhD Cody Chiuzan, PhD Hollings Cancer Center Department of Public Health Sciences Medical University

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

Hybrid processing of SCADA and synchronized phasor measurements for tracking network state

Hybrid processing of SCADA and synchronized phasor measurements for tracking network state IEEE PES General Meeting, Denver, USA, July 2015 1 Hybrid processing of SCADA and synchronized phasor measurements for tracking network state Boris Alcaide-Moreno Claudio Fuerte-Esquivel Universidad Michoacana

More information

Predicting Solar Generation from Weather Forecasts Using Machine Learning

Predicting Solar Generation from Weather Forecasts Using Machine Learning Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin Sharma, Pranshu Sharma, David Irwin, and Prashant Shenoy Department of Computer Science University of Massachusetts Amherst

More information

Data Mining. Nonlinear Classification

Data Mining. Nonlinear Classification Data Mining Unit # 6 Sajjad Haider Fall 2014 1 Nonlinear Classification Classes may not be separable by a linear boundary Suppose we randomly generate a data set as follows: X has range between 0 to 15

More information

How To Forecast Solar Power

How To Forecast Solar Power Forecasting Solar Power with Adaptive Models A Pilot Study Dr. James W. Hall 1. Introduction Expanding the use of renewable energy sources, primarily wind and solar, has become a US national priority.

More information

Traffic Driven Analysis of Cellular Data Networks

Traffic Driven Analysis of Cellular Data Networks Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu

More information

A Property & Casualty Insurance Predictive Modeling Process in SAS

A Property & Casualty Insurance Predictive Modeling Process in SAS Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing

More information

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length

Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Title: Lending Club Interest Rates are closely linked with FICO scores and Loan Length Introduction: The Lending Club is a unique website that allows people to directly borrow money from other people [1].

More information

Rolling the Dice on Big Data. Ilse Ipsen Department of Mathematics

Rolling the Dice on Big Data. Ilse Ipsen Department of Mathematics Rolling the Dice on Big Data Ilse Ipsen Department of Mathematics The Economist, 27 February 2010 Science, 11 February 2011 McKinsey Global Institute, May 2011 Rolling the Dice on Big Data What is Big?

More information

The number of marks is given in brackets [ ] at the end of each question or part question. The total number of marks for this paper is 72.

The number of marks is given in brackets [ ] at the end of each question or part question. The total number of marks for this paper is 72. ADVANCED SUBSIDIARY GCE UNIT 4736/01 MATHEMATICS Decision Mathematics 1 THURSDAY 14 JUNE 2007 Afternoon Additional Materials: Answer Booklet (8 pages) List of Formulae (MF1) Time: 1 hour 30 minutes INSTRUCTIONS

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Technical Note. Consumer Confidence Survey Technical Note February 2011. Introduction and Background

Technical Note. Consumer Confidence Survey Technical Note February 2011. Introduction and Background Technical Note Introduction and Background Consumer Confidence Index (CCI) is a barometer of the health of the U.S. economy from the perspective of the consumer. The index is based on consumers perceptions

More information

Data Mining Lab 5: Introduction to Neural Networks

Data Mining Lab 5: Introduction to Neural Networks Data Mining Lab 5: Introduction to Neural Networks 1 Introduction In this lab we are going to have a look at some very basic neural networks on a new data set which relates various covariates about cheese

More information