Fast, Frugal and Focused:

Similar documents
How to Learn Good Cue Orders: When Social Learning Benefits Simple Heuristics

The less-is-more effect: Predictions and tests

When does ignorance make us smart? Additional factors guiding heuristic inference.

Heuristic Decision Making

Reasoning the Fast and Frugal Way: Models of Bounded Rationality

How To Understand The Reason For A Biased Mind

Classification Problems

A Bayesian Antidote Against Strategy Sprawl

Microsoft Azure Machine learning Algorithms

Data Mining - Evaluation of Classifiers

Linear Classification. Volker Tresp Summer 2015

E3: PROBABILITY AND STATISTICS lecture notes

Point Biserial Correlation Tests

Statistical Machine Learning

Integer Programming: Algorithms - 3

Question 2 Naïve Bayes (16 points)

Models of Ecological Rationality: The Recognition Heuristic

Fast and frugal forecasting

Pearson's Correlation Tests

Factoring the human decision-making limitations in mobile crowdsensing

Cross-Validation. Synonyms Rotation estimation

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Chapter 6. The stacking ensemble approach

An Introduction to Machine Learning

Social Media Mining. Data Mining Essentials

Statistical Rules of Thumb

Part 2: Analysis of Relationship Between Two Variables

MACHINE LEARNING IN HIGH ENERGY PHYSICS

CHAPTER 8. SUBJECTIVE PROBABILITY

Likelihood Approaches for Trial Designs in Early Phase Oncology

Probability Calculator

1 Maximum likelihood estimation

Organizing Your Approach to a Data Analysis

Introduction to mixed model and missing data issues in longitudinal studies

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

D-optimal plans in observational studies

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Knowledge Discovery and Data Mining

Using MS Excel to Analyze Data: A Tutorial

APPLICATION OF DATA MINING TECHNIQUES FOR DIRECT MARKETING. Anatoli Nachev

Continued Fractions and the Euclidean Algorithm

Cross Validation. Dr. Thomas Jensen Expedia.com

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

psychology and economics

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation

Lecture 8 The Subjective Theory of Betting on Theories

Is a Single-Bladed Knife Enough to Dissect Human Cognition? Commentary on Griffiths et al.

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Predict Influencers in the Social Network

PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION

Section 3 Part 1. Relationships between two numerical variables

Data Mining Lab 5: Introduction to Neural Networks

Multiple Linear Regression in Data Mining

STA 4273H: Statistical Machine Learning

T-test & factor analysis

The Binomial Distribution

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

5. Multiple regression

Supervised Feature Selection & Unsupervised Dimensionality Reduction

Regression 3: Logistic Regression

Nonparametric adaptive age replacement with a one-cycle criterion

Why High-Order Polynomials Should Not be Used in Regression Discontinuity Designs

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

A semi-supervised Spam mail detector

Research on the Factor Analysis and Logistic Regression with the Applications on the Listed Company Financial Modeling.

Comparison of machine learning methods for intelligent tutoring systems

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

QUALITY ENGINEERING PROGRAM

Reject Inference in Credit Scoring. Jie-Men Mok

Chapter 8 Subjective Probability

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

LCs for Binary Classification

Penalized regression: Introduction

JUDGMENT AS A COMPONENT DECISION PROCESS FOR CHOOSING BETWEEN SEQUENTIALLY AVAILABLE ALTERNATIVES

Machine Learning Final Project Spam Filtering

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Predictive Modeling and Big Data

Comparison of frequentist and Bayesian inference. Class 20, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

A Bayesian hierarchical surrogate outcome model for multiple sclerosis

Lasso on Categorical Data

arxiv: v1 [math.pr] 5 Dec 2011

Simple Linear Regression Inference

WORKED EXAMPLES 1 TOTAL PROBABILITY AND BAYES THEOREM

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

When Betting Odds and Credences Come Apart: More Worries for Dutch Book Arguments

5. Linear Regression

DAIC - Advanced Experimental Design in Clinical Research

Christfried Webers. Canberra February June 2015

Deliberation versus automaticity in decision making: Which presentation format features facilitate automatic decision making?

Transcription:

Fast, Frugal and Focused: When less information leads to better decisions Gregory Wheeler Munich Center for Mathematical Philosophy Ludwig Maximilians University of Munich Konstantinos Katsikopoulos Adaptive Behavior and Cognition Group Max Planck Institute for Human Development MCMP Colloquium, June 25, 2014

the total evidence norm Naive Bayes Neural Networks Rational Choice Linear Regression Dynamic Programming...in most situations we might as well throw away our information and toss a coin. - Richard Bellman 2 / 37

the total evidence norm Naive Bayes Neural Networks Rational Choice Linear Regression Dynamic Programming Bounded Rationality...in most situations we might as well throw away our information and toss a coin. - Richard Bellman 3 / 37

Ignoring Information & Better Predictions: 20 Studies on Economic, Educational and Psychological Predictions 75 Accuracy (% CORRECT) 70 65 Take The Best Tallying (1/N) Multiple Regression Minimalist 60 55 Fitting Prediction Czerlinski, Gigerenzer, & Goldstein (1999) 4 / 37

heuristic structure and strategic biases Take-the-Best (Gigerenzer & Goldstein 1996) Tallying (1/N) (Dawes 1979) Search Rule: Look up cues in random order Stopping Rule: After m (1 < m N) cues, stop the search. Decision Rule: Predict that the alternative with the higher number of positive cue values has the higher criterion value. Bias: ignore weights 5 / 37

heuristic structure and strategic biases Take-the-Best (Gigerenzer & Goldstein 1996) Search Rule: Look up the cue with the highest validity Stopping Rule: If cue values differ (+/ ), stop search. If not, look up next cue. Decision Rule: Predict that the alternative with the positive cue value has the higher criterion value. Tallying (1/N) (Dawes 1979) Search Rule: Look up cues in random order Stopping Rule: After m (1 < m N) cues, stop the search. Decision Rule: Predict that the alternative with the higher number of positive cue values has the higher criterion value. Bias: ignore cues Bias: ignore weights 6 / 37

outline 1 st Result A Puzzle 2 nd Result (Puzzle Solved!) Coherentism and Heuristics 7 / 37

decision task and setup Forced choice paired comparison task Decide which of two alternatives, A and B, has the larger value on some numerical criterion, C, given their values on n cues X 1,..., X n. 8 / 37

decision task and setup Forced choice paired comparison task Decide which of two alternatives, A and B, has the larger value on some numerical criterion, C, given their values on n cues X 1,..., X n. Perfect Discrimination Assumption Each cue discriminates among the alternatives. 9 / 37

Ignoring Information & Better Predictions: 20 Studies on Economic, Educational and Psychological Predictions 75 Accuracy (% CORRECT) 70 65 Take The Best Tallying (1/N) Multiple Regression Minimalist 60 55 Fitting Prediction Czerlinski, Gigerenzer, & Goldstein (1999) 10 / 37

accuracy as a function of size of training sample Leave-one-out Cross Validation There are n + 1 inferences to be made in population - Training sample: n - Test sample: 1 11 / 37

accuracy as a function of size of training sample Leave-one-out Cross Validation There are n + 1 inferences to be made in population - Training sample: n - Test sample: 1 Cross-validation is repeated n + 1 times: - Each 1 of n + 1 inferences comprises the test sample once. 12 / 37

accuracy as a function of size of training sample Leave-one-out Cross Validation There are n + 1 inferences to be made in population - Training sample: n - Test sample: 1 Cross-validation is repeated n + 1 times: - Each 1 of n + 1 inferences comprises the test sample once. Labeling which cues do best - v maximum cue validity in n + 1 trials; (X is that cue). - v second maximum cue validity. 13 / 37

accuracy as a function of size of training sample Leave-one-out Cross Validation There are n + 1 inferences to be made in population - Training sample: n - Test sample: 1 Cross-validation is repeated n + 1 times: - Each 1 of n + 1 inferences comprises the test sample once. Labeling which cues do best - v maximum cue validity in n + 1 trials; (X is that cue). - v second maximum cue validity. Cue covariation ρ is covariation between cues X and X : Pr(X X are correct on trial t) Pr(X is correct on trial t) Pr(X is correct on trial t) 14 / 37

α: single-cue predictive accuracy measured by leave-one-out validation Size of training sample α = 1 2 ( v ( 1 + v 1 n + 1 ) + ρ ) 15 / 37

α: single-cue predictive accuracy measured by leave-one-out validation Size of training sample α = 1 2 ( Cue Covariation v ( 1 + v 1 n + 1 ) + ρ ) 16 / 37

α: single-cue predictive accuracy measured by leave-one-out validation Size of training sample α = 1 2 ( Cue Covariation v ( 1 + v 1 n + 1 ) + ρ v : maximum cue validity in population of n + 1 trials ) 17 / 37

α: single-cue predictive accuracy measured by leave-one-out validation Size of training sample α = 1 2 ( Cue Covariation v ( 1 + v 1 n + 1 ) + ρ v : maximum cue validity in population of n + 1 trials Assumptions: ) 18 / 37

α: single-cue predictive accuracy measured by leave-one-out validation Size of training sample α = 1 2 ( Cue Covariation v ( 1 + v 1 n + 1 ) + ρ v : maximum cue validity in population of n + 1 trials Assumptions: Perfect Discrimination Assumption ) 19 / 37

α: single-cue predictive accuracy measured by leave-one-out validation Size of training sample α = 1 2 ( Cue Covariation v ( 1 + v 1 n + 1 ) + ρ v : maximum cue validity in population of n + 1 trials Assumptions: Perfect Discrimination Assumption when v v = 1 n + 1 and v otherwise. ) 20 / 37

Approximate Single-Cue Predictive Accuracy as a function of size of training sample (in 19 data sets) Accuracy (% CORRECT) 75 70 65 60 v* =.82 =.01 n = 1/2 o(o+1) =.75 - (.41/n+1) Single Cue Predicted Accuracy (theory) Take The Best (observed) Naive Bayes (observed) 55 2 3 4 5 6 7 8 9 10 Number of Objects in Training Sample (o) (Katsikopoulos, Wheeler and Şimşek, 2014 tr) 21 / 37

single variable decision rules A Brunswikian Question Under what environmental conditions do single reason rules perform well? high ρ? low ρ? some other structural feature? Egon Brunswik 22 / 37

single variable decision rules A Brunswikian Question Under what environmental conditions do single reason rules perform well? 23 / 37

single variable decision rules A Brunswikian Question Under what environmental conditions do single reason rules perform well? Cues are highly intercorrelated (Hogarth & Karelaia 2005) - average pairwise cue correlation ρ Xi X j Cues are independent (Baucells, Carrasco & Hogarth 2008) Cues are conditionally independent (Katsikopoulos & Martignon 2006) 24 / 37

single variable decision rules A Brunswikian Question Under what environmental conditions do single reason rules perform well? Cues are highly intercorrelated (Hogarth & Karelaia 2005) - average pairwise cue correlation ρ Xi X j Cues are independent (Baucells, Carrasco & Hogarth 2008) Cues are conditionally independent (Katsikopoulos & Martignon 2006) 25 / 37

central idea of focused correlation Cov[X 1,..., X n C = c] Cov[X 1,..., X n ] 26 / 37

central idea of focused correlation exp(cov[x 1,..., X n C = c] Cov[X 1,..., X n ]) 27 / 37

central idea of focused correlation exp(cov[x 1,..., X n C = c] Cov[X 1,..., X n ]) & Let all RVs be indicator functions 28 / 37

central idea of focused correlation exp(cov[x 1,..., X n C = c] Cov[X 1,..., X n ]) & Let all RVs be indicator functions For c (x 1,..., x n ) := Pr(x 1,..., x n c) Pr(x 1 c) Pr(x n c) Pr(x 1,..., x n ) Pr(x 1 ) Pr(x n ) 29 / 37

single-cue accuracy as a function of criterion predictability and focused correlation v 1 = Criterion predictability Pr(C = c X 1 = c, X 2 = x 2,..., X k = x k ) FOR c,x 2,...,x C (X 1 = c, X 2 = x 2,..., X k = x k ) k Pr(X 1 = c) Pr(X 2 = x 2 ) Pr(X k = x k ) 30 / 37

single-cue accuracy as a function of criterion predictability and focused correlation v 1 = Criterion predictability Pr(C = c X 1 = c, X 2 = x 2,..., X k = x k ) FOR c,x 2,...,x C (X 1 = c, X 2 = x 2,..., X k = x k ) k Pr(X 1 = c) Pr(X 2 = x 2 ) Pr(X k = x k ) Result: single cue accuracy increases when the ratio of criterion predictability to focused cue correlation increases 31 / 37

solving the puzzle Cues should be dependent but conditionally independent given the criterion Cues should be independent but conditionally dependent given the criterion X 1 X 1 C X 1 6?X 2 X 1?X 2 C C X 1?X 2 X 1 6?X 2 C X 2 X 2 32 / 37

solving the puzzle Result 2 v 1 = Pr(C = c X 1 = c, X 2 = x 2,..., X k = x k ) FOR c,x C (X 1 = c, X 2 = x 2,..., X k = x k ) 2,...,c k Pr(X 1 = c) Pr(X 2 = x 2 ) Pr(X k = x k ) X 1 X 1 C X 1 6?X 2 X 1?X 2 C C X 1?X 2 X 1 6?X 2 C X 2 X 2 33 / 37

Result 2 v 1 = Pr(C = c X 1 = c, X 2 = x 2,..., X k = x k )... Pr(x 1,..., x n c) c,x 2,...,c k Pr(x 1 c) Pr(x n c) N Pr(x 1,..., x n ) D Pr(x 1 ) Pr(x n ) X 1 X 1 C X 1 6?X 2 X 1?X 2 C C X 1?X 2 X 1 6?X 2 C X 2 X 2 34 / 37

resolving a discontinuity c x 2 0 x 1 x 2 C C C X 1 X 2 X 1 X 2 X 1 X 2 P (X 2 )=P (X2) 0 or P (C X 1 )=P (C X 2 ) (A2) [ P (C X 1 )=P (C X2) 0 35 / 37

adaptive epistemic norms { lousy for total evidence Conditional independence: good for single cue cond independent cues independent cues Robustness of single cue: deflationary focused corr X inflationary focused corr Total evidence coherence: { inflationary focused corr Final Remarks: - Coherentism and Heuristics are complementary - Adaptive Epistemology 36 / 37

key references Baucells, M., JA Carrasco, and R Hogarth (2008): Cumulative Dominance and Heuristic Performance in Binary Multi-attribute Choice, Operations Research, 56:1289 1304. Bovens, L. and S. Hartmann (2003). Bayesian Epistemology, Oxford Univ Press. Olsson, E. (2005). Against Coherence, Oxford University Press. Hogarth, R. and N. Karelaia (2005). Ignoring Information in Binary Choice with Continuous Variables: When is less more? Journal of Mathematical Psychology, 49: 115 124. Katsikopoulos, K and L Martignon (2006): Naïve Heuristics for Paired Comparison: Some results on their relative accuracy, Journal of Mathematical Psychology 50: 488 494. Katsikopoulos, K., L. Schooler, and R. Hertwig (2010). The Robust Beauty of Ordinary Information, Psychological Review, 117(4): 1259. Schlosshauer, M. and G. Wheeler (2011). Focused Correlation and the Jigsaw Puzzle of Variable Evidence, Philosophy of Science, 78(3): 276 92. Wheeler G., and Scheines, R. (2013). Coherence and Confirmation Through Causation, Mind, 122(435): 135-70. Wheeler, G. (2009). Focused Correlation and Confirmation, The British Journal for the Philosophy of Science, 60(1): 79 100. Wheeler G., (2012). Explaining the Limits of Olsson s Impossibility Result, The Southern Journal of Philosophy, 50(1): 136-50. 37 / 37