Sample Size Designs to Assess Controls

Similar documents

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

STA 4273H: Statistical Machine Learning

Imputing Values to Missing Data

Handling attrition and non-response in longitudinal data

Bayesian Phylogeny and Measures of Branch Support

Exam P - Total 23/

BayesX - Software for Bayesian Inference in Structured Additive Regression

Statistics Graduate Courses

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Predicting the World Cup. Dr Christopher Watts Centre for Research in Social Simulation University of Surrey

Probabilistic Methods for Time-Series Analysis

Using SAS PROC MCMC to Estimate and Evaluate Item Response Theory Models

Basic Bayesian Methods

PELLISSIPPI STATE COMMUNITY COLLEGE MASTER SYLLABUS INTRODUCTION TO STATISTICS MATH 2050

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400 STAT2400&3400 STAT2400&3400 STAT2400&3400 STAT2400&3400 STAT3400 STAT3400

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Towards running complex models on big data

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

SAS Certificate Applied Statistics and SAS Programming

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

CHAPTER 8 EXAMPLES: MIXTURE MODELING WITH LONGITUDINAL DATA

Bayesian Statistics: Indian Buffet Process

Chenfeng Xiong (corresponding), University of Maryland, College Park

The HB. How Bayesian methods have changed the face of marketing research. Summer 2004

Bayesian Predictive Profiles with Applications to Retail Transaction Data

Diploma in Business Management

Role of Automation in the Forensic Examination of Handwritten Items

The Proportional Odds Model for Assessing Rater Agreement with Multiple Modalities

Regression with a Binary Dependent Variable

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE

THE USE OF STATISTICAL DISTRIBUTIONS TO MODEL CLAIMS IN MOTOR INSURANCE

Bayes and Naïve Bayes. cs534-machine Learning

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

UNIVERSITY of MASSACHUSETTS DARTMOUTH Charlton College of Business Decision and Information Sciences Fall 2010

Introduction to Statistics (Online) Syllabus/Course Information

Estimating and Correcting the Effects of Model Selection Uncertainty

MCMC-Based Assessment of the Error Characteristics of a Surface-Based Combined Radar - Passive Microwave Cloud Property Retrieval

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Probability and Statistics

PS 271B: Quantitative Methods II. Lecture Notes

BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA

Statistics in Applications III. Distribution Theory and Inference

Calculating Effect-Sizes

Program description for the Master s Degree Program in Mathematics and Finance

Sensitivity Analysis in Multiple Imputation for Missing Data

Bayesian Analysis for the Social Sciences

STAT3016 Introduction to Bayesian Data Analysis

Estimation and comparison of multiple change-point models

Incorporating prior information to overcome complete separation problems in discrete choice model estimation

Operational Risk Management: Added Value of Advanced Methodologies

Non-Life Insurance Mathematics

Model-based Synthesis. Tony O Hagan

Introduction to Markov Chain Monte Carlo

Organizing Your Approach to a Data Analysis

HIDDEN MARKOV MODELS FOR ALCOHOLISM TREATMENT TRIAL DATA

DATA MINING IN FINANCE

Chapter 5. Random variables

Master programme in Statistics

Analysis of Financial Time Series

Simulation Exercises to Reinforce the Foundations of Statistical Thinking in Online Classes

A Fuel Cost Comparison of Electric and Gas-Powered Vehicles

II. DISTRIBUTIONS distribution normal distribution. standard scores

Analysis of. Quiz on Sales Force

Statistics & Probability PhD Research. 15th November 2014

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Compression algorithm for Bayesian network modeling of binary systems

Examining credit card consumption pattern

A Bayesian Antidote Against Strategy Sprawl

Statistical Learning THE TENTH EUGENE LUKACS SYMPOSIUM DEPARTMENT OF MATHEMATICS AND STATISTICS BOWLING GREEN STATE UNIVERSITY

APPLIED MISSING DATA ANALYSIS

Latent Class Regression Part II

CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Bayesian Statistical Analysis in Medical Research

Faculty of Science School of Mathematics and Statistics

Sampling via Moment Sharing: A New Framework for Distributed Bayesian Inference for Big Data

240ST014 - Data Analysis of Transport and Logistics

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Credit Risk Models. August 24 26, 2010

Big Data, Statistics, and the Internet

Dealing with Missing Data

Markov Chain Monte Carlo Simulation Made Simple

Bayesian Adaptive Designs for Early-Phase Oncology Trials

Statistics Graduate Programs

Bayesian Statistics in One Hour. Patrick Lam

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Dynamic Modeling, Predictive Control and Performance Monitoring

Inference of Probability Distributions for Trust and Security applications

Dirichlet Processes A gentle tutorial

Missing Data. Katyn & Elena

PSI Pharmaceutical Statistics Journal Club Meeting David Ohlssen, Novartis. 25th November 2014

Likelihood Approaches for Trial Designs in Early Phase Oncology

Using DOE in Service Quality

CHAPTER 2 Estimating Probabilities

Transcription:

Sample Size Designs to Assess Controls B. Ricky Rambharat, PhD, PStat Lead Statistician Office of the Comptroller of the Currency U.S. Department of the Treasury Washington, DC FCSM Research Conference Session B-2 Nov. 4 th, 2013

Disclaimer The views expressed in this presentation are solely those of the author. This talk neither represents the opinions of the U.S. Department of the Treasury nor the Office of the Comptroller of the Currency (OCC). All errors are author's responsibility. As this talk is based on a paper in draft status, please do not reference without conferring with the author.

Motivation: How to Set Control?

Examples of Controls Currency Transaction Report (CTR) Flag if > US $10,000.00 Credit score cutoffs Accept / Deny applicant College applications and entrance exams Job applications and interview process Challenge: control-setting and risk

Type I vs. Type II Error Control Type I error: False positives Compliance costs Manpower Resource constraints Job offer to weak candidates Accept weak students Type II error: False negatives Criminal activity Unfair consumer practices

Sample Design & Analysis C Goal: Assess control C Definition: What is an error? Q: What sample size n to gauge Type II error? Assumption: Rare error view? Result: estimate error rate below C Follow-on: How to do deep-dives? 2-stage?

Sampling Models Hyper-Geometric distribution Sampling without replacement model Normal distribution (approx.) Cochran (1977) Accommodate rare errors? Binomial distribution (approx.) n, p Any 0<p<1; numerical solution Poisson distribution (approx.) Related to Binomial

Inference I: Total Errors Estimation of population error rate: p Hypothesis-testing problems & deep-dives Ballpark notion of error incidence Useful for regression modeling? Estimation of total number of errors: M Relevance to key policy decisions A different view of risk Informative for deep-dive studies?

Inference II: Total Errors Sampling models for M (discrete data) Hyper-Geometric distribution Binomial (alternative) Normal (OK could be inaccurate) Computations Prob. of exactly k errors in a sample of size n Prob. of at least k errors in a sample of size n

Bayesian Estimation Goal: Posterior distribution of M How are total errors distributed? Variability measures Risk evaluations Relevance for deep-dive analysis Markov chain Monte Carlo Alternative: Posterior distribution of p Beta-Binomial framework

Discrete Uniform Prior Distributions Mass on {0,1,2,,N} Vague / Non-informative (reflects ignorance) Poisson Mass on {0,1,2,3, } Leverage data Empirical Bayes Plan: use mixture to construct prior, p(m). Balance ignorance and data.

Likelihood Function Quantities of interest Pr(exactly k errors) Pr(at least k errors) Hyper-Geometric (HG) likelihood Limited for large N (population size) Workaround: Compute log-probabilities

Proposal Distribution Poisson proposal distribution Choose λ: another mixture? Weight λ ( ignorance vs. MCMC learning ) MCMC mixing properties Convergence of MCMC chain Satisfaction of detailed balance condition

Numerical Experiments INPUTS Set 1 Set 2 Set 3 Set 4 k (# of errors) 0 1 1 1 n (sample size) 100 100 40 100 N (pop. size) 30,000 30,000 30,000 1,000,000 Weights: Prior {DU; Pois} Weights: Prop. {Fix; Dynamic} {0.999; 0.001} {0.01; 0.99} {0.01; 0.99} {0; 1} {0.90; 0.10} {0.01; 0.99} {0.001; 0.999} {0; 1}

Posterior Results: Univariate Errors

Deep-Dive Sample Design Leverage stage 1 results for stage 2 Posterior simulations and deep-dive Probabilistic view of risk Easier for uni-modal posteriors Practical constraints vs. posterior outcome E.g., bi-modal distributions Relevance for sample size n = n(posterior)

Multinomial Experiment (Schematic)

Sample Size & Control-Setting Multinomial probabilities Chi-squared tests of significance Deep-dive and sample size by segment Estimate total errors in risky sub-populations Straightforward mapping to error rate Relevance to control settings

Multiple Levels of Errors Attribute sampling: binary error incidence E.g., degree of injury to consumer Posterior analysis Bivariate HG likelihood Bivariate Poisson proposal distribution Co-dependence between errors? General modeling E.g., Ordinal, Multinomial logit

Preliminary Conclusions Total number of errors vs. Error rate Posterior analysis of total errors Subjective / Data influence Sample design and posterior results Link control settings to risky sub-populations

Future Work Framework for multiple levels of errors MCMC estimation / posterior analysis Sample size and posterior results Logic for control settings Computational tool

END: THANK YOU