Evaluating the generalizability of an RCT using electronic health records data

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Evaluating the generalizability of an RCT using electronic health records data"

Transcription

1 Evaluatng the generalzablty of an RCT usng electronc health records data

2 3 nterestng questons Is our RCT representatve? How can we generalze RCT results? Can we use EHR* data as a control group? *) Electronc Health Records

3

4 Dfference n data collecton The RCT data The EHR data Patents ncluded by crteras Regular vsts Measurements accordng to protocol Extenste data cleanng Usually randomzed Mssng could be non-nformatve e.g. LDL mssng due to lost sample Patents ncluded f they go to the doctor Vsts as needed Measurements as needed Data cleanng? confoundng Mssng s almost never non-nformatve e.g. LDL not needed and thus mssng Placebo effect

5 An llustratve(?) example One large global CV outcome study n prmary preventon wth lberal ncluson/excluson crtera Three CPRD cohorts ndcated for prmary preventon of CV dsease were created from a 6 year study wndow ( ): Cohort 1: patents meetng the tral s ncluson crtera and excludng those wth pror CV hstory or CRP value ndcatng severe bacteral nfecton Cohort 2: patents meetng NICE gudelnes for recommended statn therapy n prmary preventon of CV dsease

6 Sample szes and varables Sampleszes Data set N N complete cases N after mputaton RCT CPRD CPRD Realstc cohort Very broad cohort The analyss was run on two dfferent sets of varables: Framngham varables: AGE, BMI, TC, sex, smokng All* varables: AGE, ANTIHT_USE, ASA_USE, BMI, CRP, DBP, FPG, HDL, LDL, SBP, TC, TG, sex, smokng, weght

7 7 RWE Introducton February 2013

8 8 RWE Introducton February 2013

9 If we have the RCT results, what can we say about the effect n dfferent populaton? effect

10 Is our tral representatve? -compare patent characterstcs One varable at a tme All varables at the same tme Convex hull Cross matchng (Lnear) dscrmnant analyss

11 Descrptve statstcs per varable RCT mean RCT std EHR mean EHR std GENDER_MALE AGE WEIGHT BMI SBP DBP CIGS/DAY SMOKER FPG GLUC LDL HDL TG TC HBA1C CRP ASA USE ANTIHT USE MEDHIST DM FAMHIST CHD

12 Convex hull Idea: Construct the convex hull for the RCT patents and see how many RWE patents fall nto that Defnton: the convex hull for a set S s the smallest convex set that contans s 1 d convex hull: The range 2 d convex hull: K d convex hull: Wrap t n (stff) paper Example: matcht(formula = tral ~ AGE + BMI+ CRP + SBP + DBP + sex + LDL + FPG + CR_CL + TC + TG + HDL, data=anadata, method="nearest", dscard="hull.control") Sample szes: EHR patents All 3933 In the RCT convex hull 254 Dscarded 3679 Almost all of the EHR patents les outsde of the convex hull for the RCT patents. It s enough to be extreme on one varable (or n one drecton)

13 Cross matchng Cross matchng as a test comparng multvarate dstrbutons Rosenbaum (2005) Idea: Merge all the data Create match pars usng the Mahalanobs dstance Count the number of cross matches (A matched to B) The number of cross matches has a know dstrbuton under H 0 A B A B A B A B B A A B A B A A A B B B A A B B Observed number of cross matches: 907 Expected number of cross matches under H 0 : 1183 P-value: 10**(-35) Indcates that the RCT and RWE populatons dffer

14 Lnear dscrmnant analyss Try lnear dscrmnant analyss to fnd a lnear functon that separates RCT and EHR patents Coeffcents of lnear dscrmnants: LD1 AGE_I BMI CRP SBP DBP LDL FPG CR_CL TC TG HDL Percent of Total Comparng dstrbuton of posteror predcton probabltes RCT RWE Mean=0.58 Mean= posteror predcton probabltes

15 Next steps Propensty score (Stuart &Cole 2010, 2011) Cross desgn synthess (Kazar 2009) Herarccal models (Prevost et al 2000)

16 The nce thng about propensty score e X = P T X e X X

17 For us S ndcates membershp n the RCT sample T ndcates treatment assgnment T = {1,0} covarates X potental outcomes: Y (1) would be observed under treatment Y (0) would be observed under control The sample average treatment effect SATE = 1 n s =1 Y 1 Y 0 The populaton average treatment effect N PATE = 1 Y N 1 Y 0 =1

18 Key assumptons All patents n the populaton have some probablty of beng n the tral and no patents are always n the tral 0 < P S = 1 X < 1 Incluson n the tral does not depend on the potental outcomes except though the covarates X S Y 0, Y 1 X Treatment assgnment does not depend on the ncluson nto the tral or the potental outcomes except through X T S, Y 0, Y 1 X

19 Propensty score as a dstance meaure S S p x e n N x e n So, what constues a bg dffernece? Suggeston: bg f p p c ˆ C=0.25 or 0.1 has been suggested

20 Propensty score, all varbles Comparng dstrbuton of propensty scores, PLS-glm Cohort1 RCT Jupter 20 Comparng dstrbuton of propensty scores, PLS-glm Cohort3 Jupter RCT Percent of Total 20 CPRD CPRD 1 30 CPRD 3 0 Percent of Total Propensty score Propensty score Δ=0.165 Δ=0.478

21 Propensty scores based on rsk factors Comparng dstrbuton of propensty scores, PLS-glm Cohort1 FH RCT Jupter 25 Comparng dstrbuton of propensty scores, PLS-glm Cohort3 FH RCT Jupter Percent of Total 25 CPRD 1 5 CPRD 1 CPRD 3 0 Percent of Total 30 CPRD Propensty score Propensty score Δ=0.06 Δ=0.38

22 Predct the results n the EHR populaton (PATE) IPSW : Inverse probablty of selecton weght Weght: w P P S S 1 X In our case the endpont s a tme to event so we ll ft a weghted Cox proportonal hazards model wth the partal lkelhood L n n 1 k 1 expx w expx w k 1 R t k k Y Stuart & Cole 2011

23 Predcted treatment effect n the example Cohort Varables used Hazard Lower Upper p-value rato RCT < Cohort1 All < Cohort1 Framngham < Cohort3 All < Cohort3 Framngham < The predcted treatment effect s slghtly closer to 1 whch ndcates an under representaton of low rsk patents n the Jupter populaton The dfference s smallest for cohort 1 and largest for cohort 3 Rsk factors don t account for all confoundng(?)

24 What s next? pacehr Tralo (n=453) EHR(n=38,000) N=662 At least 2 exacerpaton 1 year pre ndex Age Sprometry data N=367 No COPD & 1 year follow up data Only Gna 4 and 5 placebo (n=151) N=193 ptw

25 EHR data as a control group Weght the EHR data by: P S = 1 X P S = 0 X Weght the HER to estmate the treatment outcome that would have been observed f the HER data had the same dstrbuton of patents characterstcs as the RCT Sort of lke estmatng the ATT n an observatonal study.

26 Dong wthout patent level RCT data Evaluate the generalzablty usng Presslee s method Use weghts from the method of moments (Sgnorovch 2012)

27 Dong wthout tyhe RCT patent data 1 Idea: Use the RCT ncluson/excluson crtera to splt a regstry cohort nto RCT elgble and RCT non elgble patents. No acctual RCT data needed! E Defne the sub populaton average treatment effect n each sub populaton PATE SPATE(I) SPATE(E) 1 NI 1 N I N 1 0 N 1 N I 1 Y 1 Y 0 1 N I E 1 Y 1 Y 0 E 1 E I Y 1 Y Measure the generalzaton error by comparng outcomes for I and the whole populaton ˆ ˆ I I E E E ˆ I I ˆ E E

28 Dong wthout patent level data 2 R Regstry Actve Placebo Patent level data Aggregate data Weght the regstry on Gender Age BMI SBI LDL TG Smokng Outcome Y x, y x, y C C Idea: Use ˆ y w w y C where w P P S S 1 x 0 x T Estmate the weghts usng logstc regresson: w exp x T ˆ x exp x Method of moments, solve xc 0 T expx ˆ

29 References Ima K, Kng G, Stuart EA. Msunderstandngs between expermentalsts and observatonalsts about causal nference. Journal of the Royal Statstcal Socety A. 2008; 171: Stuart EA, Cole SR, Bradshaw CP, Leaf PJ. The use of propensty scores to assess the generalzablty of results from randomzed trals. Journal of the Royal Statstcal Socety A. 2011; 174: Cole SR, Stuart EA. Generalzng evdence from randomzed clncal trals to target populatons. Amercan Journal of Epdemology. 2010; 172: Rosenbaum PR. An exact dstrbuton-free test comparng two multvarate dstrbutons based on adjacency. Journal of the Royal Statstcal Socety B. 2005; 67: Prevost TC, Abrams KR, Jones DR. Herarccal models n generalzed synthess of evdence: an example based on studes of breast cancer screenng. Statstcs n Medcne. 2000; 19: Sgnorovch et al Matchng-Adjusted Indrect Comparsons: A New Tool for Tmely Comparatve Effectveness Research. Value n Health. 2012; 12: Pressler T. R. Kazar E. E. The use of propensty scores and observatonal data to estmate randomzed controlled tral generalzablty bas. Statstcs n Medcne. 2013; 32:

The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15

The Analysis of Covariance. ERSH 8310 Keppel and Wickens Chapter 15 The Analyss of Covarance ERSH 830 Keppel and Wckens Chapter 5 Today s Class Intal Consderatons Covarance and Lnear Regresson The Lnear Regresson Equaton TheAnalyss of Covarance Assumptons Underlyng the

More information

Directed acyclic graphs (DAGs) for causal analysis Supporting text. Ph.D. course in epidemiology. Simpson: baby playing cards

Directed acyclic graphs (DAGs) for causal analysis Supporting text. Ph.D. course in epidemiology. Simpson: baby playing cards Drected acyclc graphs (DAGs) for causal analyss Supportng text M.M. Glymour (2006). sng causal dagrams to understand common problems n socal epdemology. Ph.D. course n epdemology hapter 6 n J.M. Oakes

More information

THE TITANIC SHIPWRECK: WHO WAS

THE TITANIC SHIPWRECK: WHO WAS THE TITANIC SHIPWRECK: WHO WAS MOST LIKELY TO SURVIVE? A STATISTICAL ANALYSIS Ths paper examnes the probablty of survvng the Ttanc shpwreck usng lmted dependent varable regresson analyss. Ths appled analyss

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Analysis of Covariance

Analysis of Covariance Chapter 551 Analyss of Covarance Introducton A common tas n research s to compare the averages of two or more populatons (groups). We mght want to compare the ncome level of two regons, the ntrogen content

More information

b) The mean of the fitted (predicted) values of Y is equal to the mean of the Y values: c) The residuals of the regression line sum up to zero: = ei

b) The mean of the fitted (predicted) values of Y is equal to the mean of the Y values: c) The residuals of the regression line sum up to zero: = ei Mathematcal Propertes of the Least Squares Regresson The least squares regresson lne obeys certan mathematcal propertes whch are useful to know n practce. The followng propertes can be establshed algebracally:

More information

Quality Adjustment of Second-hand Motor Vehicle Application of Hedonic Approach in Hong Kong s Consumer Price Index

Quality Adjustment of Second-hand Motor Vehicle Application of Hedonic Approach in Hong Kong s Consumer Price Index Qualty Adustment of Second-hand Motor Vehcle Applcaton of Hedonc Approach n Hong Kong s Consumer Prce Index Prepared for the 14 th Meetng of the Ottawa Group on Prce Indces 20 22 May 2015, Tokyo, Japan

More information

An Analysis of Factors Influencing the Self-Rated Health of Elderly Chinese People

An Analysis of Factors Influencing the Self-Rated Health of Elderly Chinese People Open Journal of Socal Scences, 205, 3, 5-20 Publshed Onlne May 205 n ScRes. http://www.scrp.org/ournal/ss http://dx.do.org/0.4236/ss.205.35003 An Analyss of Factors Influencng the Self-Rated Health of

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC

Approximating Cross-validatory Predictive Evaluation in Bayesian Latent Variables Models with Integrated IS and WAIC Approxmatng Cross-valdatory Predctve Evaluaton n Bayesan Latent Varables Models wth Integrated IS and WAIC Longha L Department of Mathematcs and Statstcs Unversty of Saskatchewan Saskatoon, SK, CANADA

More information

Questions that we may have about the variables

Questions that we may have about the variables Antono Olmos, 01 Multple Regresson Problem: we want to determne the effect of Desre for control, Famly support, Number of frends, and Score on the BDI test on Perceved Support of Latno women. Dependent

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering

CS 2750 Machine Learning. Lecture 17a. Clustering. CS 2750 Machine Learning. Clustering Lecture 7a Clusterng Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Clusterng Groups together smlar nstances n the data sample Basc clusterng problem: dstrbute data nto k dfferent groups such that

More information

Statistical algorithms in Review Manager 5

Statistical algorithms in Review Manager 5 Statstcal algorthms n Reve Manager 5 Jonathan J Deeks and Julan PT Hggns on behalf of the Statstcal Methods Group of The Cochrane Collaboraton August 00 Data structure Consder a meta-analyss of k studes

More information

The covariance is the two variable analog to the variance. The formula for the covariance between two variables is

The covariance is the two variable analog to the variance. The formula for the covariance between two variables is Regresson Lectures So far we have talked only about statstcs that descrbe one varable. What we are gong to be dscussng for much of the remander of the course s relatonshps between two or more varables.

More information

Diabetes as a Predictor of Mortality in a Cohort of Blind Subjects

Diabetes as a Predictor of Mortality in a Cohort of Blind Subjects Internatonal Journal of Epdemology Internatonal Epdemologcal Assocaton 1996 Vol. 25, No. 5 Prnted n Great Brtan Dabetes as a Predctor of Mortalty n a Cohort of Blnd Subjects CHRISTOPH TRAUTNER,* ANDREA

More information

Multivariate EWMA Control Chart

Multivariate EWMA Control Chart Multvarate EWMA Control Chart Summary The Multvarate EWMA Control Chart procedure creates control charts for two or more numerc varables. Examnng the varables n a multvarate sense s extremely mportant

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

The Probit Model. Alexander Spermann. SoSe 2009

The Probit Model. Alexander Spermann. SoSe 2009 The Probt Model Aleander Spermann Unversty of Freburg SoSe 009 Course outlne. Notaton and statstcal foundatons. Introducton to the Probt model 3. Applcaton 4. Coeffcents and margnal effects 5. Goodness-of-ft

More information

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven)

Clustering Gene Expression Data. (Slides thanks to Dr. Mark Craven) Clusterng Gene Epresson Data Sldes thanks to Dr. Mark Craven Gene Epresson Proles we ll assume we have a D matr o gene epresson measurements rows represent genes columns represent derent eperments tme

More information

An Efficient Framework for Online Advertising Effectiveness Measurement and Comparison

An Efficient Framework for Online Advertising Effectiveness Measurement and Comparison An Effcent Framework for Onlne Advertsng Effectveness Measurement and Comparson Pengyuan Wang Yahoo Labs 701 1st Ave, Sunnyvale Calforna 94089 pengyuan@yahoonc.com Han-Yun Tsao Yahoo Labs 701 1st Ave,

More information

Estimating Age-specific Prevalence of Testosterone Deficiency in Men Using Normal Mixture Models

Estimating Age-specific Prevalence of Testosterone Deficiency in Men Using Normal Mixture Models Journal of Data Scence 7(2009), 203-217 Estmatng Age-specfc Prevalence of Testosterone Defcency n Men Usng Normal Mxture Models Yungta Lo Mount Sna School of Medcne Abstract: Testosterone levels declne

More information

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University

Time Series Analysis in Studies of AGN Variability. Bradley M. Peterson The Ohio State University Tme Seres Analyss n Studes of AGN Varablty Bradley M. Peterson The Oho State Unversty 1 Lnear Correlaton Degree to whch two parameters are lnearly correlated can be expressed n terms of the lnear correlaton

More information

Calculation of Sampling Weights

Calculation of Sampling Weights Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

A 'Virtual Population' Approach To Small Area Estimation

A 'Virtual Population' Approach To Small Area Estimation A 'Vrtual Populaton' Approach To Small Area Estmaton Mchael P. Battagla 1, Martn R. Frankel 2, Machell Town 3 and Lna S. Balluz 3 1 Abt Assocates Inc., Cambrdge MA 02138 2 Baruch College, CUNY, New York

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

I529: Machine Learning in Bioinformatics (Spring 2013) Markov Models

I529: Machine Learning in Bioinformatics (Spring 2013) Markov Models I529: Machne Learnng n Bonformatcs (Sprng 213) Markov Models Yuzhen Ye School of Informatcs and Computng Indana Unversty, Bloomngton Sprng 213 Outlne Smple model (frequency & profle) revew Markov chan

More information

Nasdaq Iceland Bond Indices 01 April 2015

Nasdaq Iceland Bond Indices 01 April 2015 Nasdaq Iceland Bond Indces 01 Aprl 2015 -Fxed duraton Indces Introducton Nasdaq Iceland (the Exchange) began calculatng ts current bond ndces n the begnnng of 2005. They were a response to recent changes

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Regresson a means of predctng a dependent varable based one or more ndependent varables. -Ths s done by fttng a lne or surface to the data ponts that mnmzes the total error. -

More information

Quantization Effects in Digital Filters

Quantization Effects in Digital Filters Quantzaton Effects n Dgtal Flters Dstrbuton of Truncaton Errors In two's complement representaton an exact number would have nfntely many bts (n general). When we lmt the number of bts to some fnte value

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Binary Dependent Variables. In some cases the outcome of interest rather than one of the right hand side variables is discrete rather than continuous

Binary Dependent Variables. In some cases the outcome of interest rather than one of the right hand side variables is discrete rather than continuous Bnary Dependent Varables In some cases the outcome of nterest rather than one of the rght hand sde varables s dscrete rather than contnuous The smplest example of ths s when the Y varable s bnary so that

More information

Adaptive Clinical Trials Incorporating Treatment Selection and Evaluation: Methodology and Applications in Multiple Sclerosis

Adaptive Clinical Trials Incorporating Treatment Selection and Evaluation: Methodology and Applications in Multiple Sclerosis Adaptve Clncal Trals Incorporatng Treatment electon and Evaluaton: Methodology and Applcatons n Multple cleross usan Todd, Tm Frede, Ngel tallard, Ncholas Parsons, Elsa Valdés-Márquez, Jeremy Chataway

More information

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001.

Inequality and The Accounting Period. Quentin Wodon and Shlomo Yitzhaki. World Bank and Hebrew University. September 2001. Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

LETTER IMAGE RECOGNITION

LETTER IMAGE RECOGNITION LETTER IMAGE RECOGNITION 1. Introducton. 1. Introducton. Objectve: desgn classfers for letter mage recognton. consder accuracy and tme n takng the decson. 20,000 samples: Startng set: mages based on 20

More information

14.74 Lecture 5: Health (2)

14.74 Lecture 5: Health (2) 14.74 Lecture 5: Health (2) Esther Duflo February 17, 2004 1 Possble Interventons Last tme we dscussed possble nterventons. Let s take one: provdng ron supplements to people, for example. From the data,

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

Start me up: The Effectiveness of a Self-Employment Programme for Needy Unemployed People in Germany*

Start me up: The Effectiveness of a Self-Employment Programme for Needy Unemployed People in Germany* Start me up: The Effectveness of a Self-Employment Programme for Needy Unemployed People n Germany* Joachm Wolff Anton Nvorozhkn Date: 22/10/2008 Abstract In recent years actvaton of means-tested unemployment

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Lecture 10: Linear Regression Approach, Assumptions and Diagnostics

Lecture 10: Linear Regression Approach, Assumptions and Diagnostics Approach to Modelng I Lecture 1: Lnear Regresson Approach, Assumptons and Dagnostcs Sandy Eckel seckel@jhsph.edu 8 May 8 General approach for most statstcal modelng: Defne the populaton of nterest State

More information

The Current Employment Statistics (CES) survey,

The Current Employment Statistics (CES) survey, Busness Brths and Deaths Impact of busness brths and deaths n the payroll survey The CES probablty-based sample redesgn accounts for most busness brth employment through the mputaton of busness deaths,

More information

Testing GOF & Estimating Overdispersion

Testing GOF & Estimating Overdispersion Testng GOF & Estmatng Overdsperson Your Most General Model Needs to Ft the Dataset It s mportant that the most general (complcated) model n your canddate model lst fts the data well. Ths model s a benchmark

More information

U.C. Berkeley CS270: Algorithms Lecture 4 Professor Vazirani and Professor Rao Jan 27,2011 Lecturer: Umesh Vazirani Last revised February 10, 2012

U.C. Berkeley CS270: Algorithms Lecture 4 Professor Vazirani and Professor Rao Jan 27,2011 Lecturer: Umesh Vazirani Last revised February 10, 2012 U.C. Berkeley CS270: Algorthms Lecture 4 Professor Vazran and Professor Rao Jan 27,2011 Lecturer: Umesh Vazran Last revsed February 10, 2012 Lecture 4 1 The multplcatve weghts update method The multplcatve

More information

Naïve Bayes classifier & Evaluation framework

Naïve Bayes classifier & Evaluation framework Lecture aïve Bayes classfer & Evaluaton framework Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Generatve approach to classfcaton Idea:. Represent and learn the dstrbuton p x, y. Use t to defne probablstc

More information

II. PROBABILITY OF AN EVENT

II. PROBABILITY OF AN EVENT II. PROBABILITY OF AN EVENT As ndcated above, probablty s a quantfcaton, or a mathematcal model, of a random experment. Ths quantfcaton s a measure of the lkelhood that a gven event wll occur when the

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

Survival analysis methods in Insurance Applications in car insurance contracts

Survival analysis methods in Insurance Applications in car insurance contracts Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 Jean-Mare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on

More information

Enhancing the Quality of Price Indexes A Sampling Perspective

Enhancing the Quality of Price Indexes A Sampling Perspective Enhancng the Qualty of Prce Indexes A Samplng Perspectve Jack Lothan 1 and Zdenek Patak 2 Statstcs Canada 1 Statstcs Canada 2 Abstract Wth the release of the Boskn Report (Boskn et al., 1996) on the state

More information

Analysis of Premium Liabilities for Australian Lines of Business

Analysis of Premium Liabilities for Australian Lines of Business Summary of Analyss of Premum Labltes for Australan Lnes of Busness Emly Tao Honours Research Paper, The Unversty of Melbourne Emly Tao Acknowledgements I am grateful to the Australan Prudental Regulaton

More information

Meta-analysis in Psychological Research.

Meta-analysis in Psychological Research. Internatonal Journal of Psychologcal Research, 010. Vol. 3. No. 1. ISSN mpresa (prnted 011-084 ISSN electrónca (electronc 011-079 Sánchez-Meca, J., Marín-Martínez, F., (010. Meta-analyss n Psychologcal

More information

Criminal Justice System on Crime *

Criminal Justice System on Crime * On the Impact of the NSW Crmnal Justce System on Crme * Dr Vasls Sarafds, Dscplne of Operatons Management and Econometrcs Unversty of Sydney * Ths presentaton s based on jont work wth Rchard Kelaher 1

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution Estmatng otal Clam Sze n the Auto Insurance Industry: a Comparson between weede and Zero-Adjusted Inverse Gaussan Dstrbuton Autora: Adrana Bruscato Bortoluzzo, Italo De Paula Franca, Marco Antono Leonel

More information

Nuno Vasconcelos UCSD

Nuno Vasconcelos UCSD Bayesan parameter estmaton Nuno Vasconcelos UCSD 1 Maxmum lkelhood parameter estmaton n three steps: 1 choose a parametrc model for probabltes to make ths clear we denote the vector of parameters by Θ

More information

Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs *

Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs * Causal Effects n Non-Expermental Studes: Re-Evaluatng the Evaluaton of Tranng Programs * Rajeev H. Deheja and Sadek Wahba cte as Rajeev Deheja and Sadek Wahba, Causal Effects n Non-Expermental Studes:

More information

Study on CET4 Marks in China s Graded English Teaching

Study on CET4 Marks in China s Graded English Teaching Study on CET4 Marks n Chna s Graded Englsh Teachng CHE We College of Foregn Studes, Shandong Insttute of Busness and Technology, P.R.Chna, 264005 Abstract: Ths paper deploys Logt model, and decomposes

More information

Linear Regression Analysis for STARDEX

Linear Regression Analysis for STARDEX Lnear Regresson Analss for STARDEX Malcolm Halock, Clmatc Research Unt The followng document s an overvew of lnear regresson methods for reference b members of STARDEX. Whle t ams to cover the most common

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Evaluating credit risk models: A critique and a new proposal

Evaluating credit risk models: A critique and a new proposal Evaluatng credt rsk models: A crtque and a new proposal Hergen Frerchs* Gunter Löffler Unversty of Frankfurt (Man) February 14, 2001 Abstract Evaluatng the qualty of credt portfolo rsk models s an mportant

More information

the Manual on the global data processing and forecasting system (GDPFS) (WMO-No.485; available at http://www.wmo.int/pages/prog/www/manuals.

the Manual on the global data processing and forecasting system (GDPFS) (WMO-No.485; available at http://www.wmo.int/pages/prog/www/manuals. Gudelne on the exchange and use of EPS verfcaton results Update date: 30 November 202. Introducton World Meteorologcal Organzaton (WMO) CBS-XIII (2005) recommended that the general responsbltes for a Lead

More information

Phoenix Center Policy Paper Number 39: Internet Use and Job Search. (January 2010)

Phoenix Center Policy Paper Number 39: Internet Use and Job Search. (January 2010) PHOENIX CENTER POLICY PAPER SERIES Phoenx Center Polcy Paper Number 39: Internet Use and Job Search T. Randolph Beard, PhD George S. Ford PhD Rchard P. Saba, PhD (January 2010), T. Randolph Beard, George

More information

1 Approximation Algorithms

1 Approximation Algorithms CME 305: Dscrete Mathematcs and Algorthms 1 Approxmaton Algorthms In lght of the apparent ntractablty of the problems we beleve not to le n P, t makes sense to pursue deas other than complete solutons

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution

Estimating Total Claim Size in the Auto Insurance Industry: a Comparison between Tweedie and Zero-Adjusted Inverse Gaussian Distribution Avalable onlne at http:// BAR, Curtba, v. 8, n. 1, art. 3, pp. 37-47, Jan./Mar. 2011 Estmatng Total Clam Sze n the Auto Insurance Industry: a Comparson between Tweede and Zero-Adjusted Inverse Gaussan

More information

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60

x f(x) 1 0.25 1 0.75 x 1 0 1 1 0.04 0.01 0.20 1 0.12 0.03 0.60 BIVARIATE DISTRIBUTIONS Let be a varable that assumes the values { 1,,..., n }. Then, a functon that epresses the relatve frequenc of these values s called a unvarate frequenc functon. It must be true

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Binomial Link Functions. Lori Murray, Phil Munz

Binomial Link Functions. Lori Murray, Phil Munz Bnomal Lnk Functons Lor Murray, Phl Munz Bnomal Lnk Functons Logt Lnk functon: ( p) p ln 1 p Probt Lnk functon: ( p) 1 ( p) Complentary Log Log functon: ( p) ln( ln(1 p)) Motvatng Example A researcher

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Lecture 9: Logit/Probit. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II

Lecture 9: Logit/Probit. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Lecture 9: Logt/Probt Prof. Sharyn O Halloran Sustanable Development U96 Econometrcs II Revew of Lnear Estmaton So far, we know how to handle lnear estmaton models of the type: Y = β 0 + β *X + β 2 *X

More information

Nandini Dendukuri 1,2 Caroline Reinhold 3,4

Nandini Dendukuri 1,2 Caroline Reinhold 3,4 Dendukur and Renhold Correlaton and Regresson Research Fundamentals of Clncal Research for Radologsts Downloaded from www.ajronlne.org by 37.44.07.0 on 0/3/7 from I address 37.44.07.0. Copyrght ARRS. For

More information

Causality and potential outcomes Average causal effects

Causality and potential outcomes Average causal effects treatment effects The term treatment effect refers to the causal effect of a bnary (0 1) varable on an outcome varable of scentfc or polcy nterest. Economcs examples nclude the effects of government programmes

More information

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology

Method for assessment of companies' credit rating (AJPES S.BON model) Short description of the methodology Method for assessment of companes' credt ratng (AJPES S.BON model) Short descrpton of the methodology Ljubljana, May 2011 ABSTRACT Assessng Slovenan companes' credt ratng scores usng the AJPES S.BON model

More information

Meta-Analysis of Hazard Ratios

Meta-Analysis of Hazard Ratios NCSS Statstcal Softare Chapter 458 Meta-Analyss of Hazard Ratos Introducton Ths module performs a meta-analyss on a set of to-group, tme to event (survval), studes n hch some data may be censored. These

More information

Table of Contents EQ.10...46 EQ.6...46 EQ.8...46

Table of Contents EQ.10...46 EQ.6...46 EQ.8...46 Table of Contents CHAPTER II - PATTERN RECOGNITION.... THE PATTERN RECOGNITION PROBLEM.... STATISTICAL FORMULATION OF CLASSIFIERS...6 3. CONCLUSIONS...30 UNDERSTANDING BAYES RULE...3 BAYESIAN THRESHOLD...33

More information

Identifying Community-Level Predictors of Depression Hospitalizations

Identifying Community-Level Predictors of Depression Hospitalizations Identfyng Communty-Level Predctors of Depresson Hosptalzatons September 2005 John Fortney Gerard Rushton Scott Wood Lxun Zhang Kathryn Rost Western Interstate Commsson for Hgher Educaton Mental Health

More information

An empirical study for credit card approvals in the Greek banking sector

An empirical study for credit card approvals in the Greek banking sector An emprcal study for credt card approvals n the Greek bankng sector Mara Mavr George Ioannou Bergamo, Italy 17-21 May 2004 Management Scences Laboratory Department of Management Scence & Technology Athens

More information

OUTLIERS IN REGRESSION

OUTLIERS IN REGRESSION OUTLIERS IN REGRESSION Dagmar Blatná Introducton A observaton that s substantally dfferent from all other ones can make a large dfference n the results of regresson analyss. Outlers occur very frequently

More information

Measures of Fit for Logistic Regression

Measures of Fit for Logistic Regression ABSTRACT Paper 1485-014 SAS Global Forum Measures of Ft for Logstc Regresson Paul D. Allson, Statstcal Horzons LLC and the Unversty of Pennsylvana One of the most common questons about logstc regresson

More information

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001

Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001 Proceedngs of the Annual Meetng of the Amercan Statstcal Assocaton, August 5-9, 2001 LIST-ASSISTED SAMPLING: THE EFFECT OF TELEPHONE SYSTEM CHANGES ON DESIGN 1 Clyde Tucker, Bureau of Labor Statstcs James

More information

Regression Models for a Binary Response Using EXCEL and JMP

Regression Models for a Binary Response Using EXCEL and JMP SEMATECH 997 Statstcal Methods Symposum Austn Regresson Models for a Bnary Response Usng EXCEL and JMP Davd C. Trndade, Ph.D. STAT-TECH Consultng and Tranng n Appled Statstcs San Jose, CA Topcs Practcal

More information

Control Charts for Means (Simulation)

Control Charts for Means (Simulation) Chapter 290 Control Charts for Means (Smulaton) Introducton Ths procedure allows you to study the run length dstrbuton of Shewhart (Xbar), Cusum, FIR Cusum, and EWMA process control charts for means usng

More information

State function: eigenfunctions of hermitian operators-> normalization, orthogonality completeness

State function: eigenfunctions of hermitian operators-> normalization, orthogonality completeness Schroednger equaton Basc postulates of quantum mechancs. Operators: Hermtan operators, commutators State functon: egenfunctons of hermtan operators-> normalzaton, orthogonalty completeness egenvalues and

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

MODEL-BASED CALIBRATION OF A NON-INVASIVE BLOOD GLUCOSE MONITOR. Yelena Shulga. A Project Report. Submitted to the Faculty. of the

MODEL-BASED CALIBRATION OF A NON-INVASIVE BLOOD GLUCOSE MONITOR. Yelena Shulga. A Project Report. Submitted to the Faculty. of the MODEL-BASED CALIBRATION OF A NON-INVASIVE BLOOD GLUCOSE MONITOR by Yelena Shulga A Project Report Submtted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE n partal fulfllment of the requrements for

More information

FINAL REPORT. City of Toronto. Contract 47016555. Project No: B000203-3

FINAL REPORT. City of Toronto. Contract 47016555. Project No: B000203-3 Cty of Toronto SAFETY IMPACTS AD REGULATIOS OF ELECTROIC STATIC ROADSIDE ADVERTISIG SIGS TECHICAL MEMORADUM #2C BEFORE/AFTER COLLISIO AALYSIS AT SIGALIZED ITERSECTIO FIAL REPORT 3027 Harvester Road, Sute

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

Detection of Health Insurance Fraud with Discrete Choice Model: Evidence from Medical Expense Insurance in China

Detection of Health Insurance Fraud with Discrete Choice Model: Evidence from Medical Expense Insurance in China Detecton of Health Insurance Fraud wth Dscrete Choce Model: Evdence from Medcal Expense Insurance n Chna Abstract: Health nsurance fraud ncreases the neffcency and nequalty n our socety. To address the

More information

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil *

Evaluating the Effects of FUNDEF on Wages and Test Scores in Brazil * Evaluatng the Effects of FUNDEF on Wages and Test Scores n Brazl * Naérco Menezes-Flho Elane Pazello Unversty of São Paulo Abstract In ths paper we nvestgate the effects of the 1998 reform n the fundng

More information

Linear Regression, Regularization Bias-Variance Tradeoff

Linear Regression, Regularization Bias-Variance Tradeoff HTF: Ch3, 7 B: Ch3 Lnear Regresson, Regularzaton Bas-Varance Tradeoff Thanks to C Guestrn, T Detterch, R Parr, N Ray 1 Outlne Lnear Regresson MLE = Least Squares! Bass functons Evaluatng Predctors Tranng

More information