Statistical Fallacies: Lying to Ourselves and Others



Similar documents
psychology and economics

AMS 5 CHANCE VARIABILITY

Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

Section 7C: The Law of Large Numbers

TABLE OF CONTENTS. ROULETTE FREE System # ROULETTE FREE System #

R Simulations: Monty Hall problem

The Math. P (x) = 5! = = 120.

John Kerrich s coin-tossing Experiment. Law of Averages - pg. 294 Moore s Text

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

13.0 Central Limit Theorem

In the situations that we will encounter, we may generally calculate the probability of an event

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Would You Like To Earn $1000 s With The Click Of A Button?

Practical Probability:


2 What is Rational: Normative. 4 Lottery & Gambling

This Method will show you exactly how you can profit from this specific online casino and beat them at their own game.

Standard 12: The student will explain and evaluate the financial impact and consequences of gambling.

Gaming the Law of Large Numbers

Ignoring base rates. Ignoring base rates (cont.) Improving our judgments. Base Rate Neglect. When Base Rate Matters. Probabilities vs.

Probability and Expected Value

Law of Large Numbers. Alexandra Barbato and Craig O Connell. Honors 391A Mathematical Gems Jenia Tevelev

Solutions: Problems for Chapter 3. Solutions: Problems for Chapter 3

Statistics. Head First. A Brain-Friendly Guide. Dawn Griffiths

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

Midterm Exam #1 Instructions:

High School Statistics and Probability Common Core Sample Test Version 2

The Easy Roulette System

STRIKE FORCE ROULETTE

We rst consider the game from the player's point of view: Suppose you have picked a number and placed your bet. The probability of winning is

MrMajik s Money Management Strategy Copyright MrMajik.com 2003 All rights reserved.

Probability --QUESTIONS-- Principles of Math 12 - Probability Practice Exam 1

Russell Hunter Publishing Inc

Lotto Master Formula (v1.3) The Formula Used By Lottery Winners

Moving on! Not Everyone Is Ready To Accept! The Fundamental Truths Of Retail Trading!

Lesson 1. Basics of Probability. Principles of Mathematics 12: Explained! 314

STA 371G: Statistics and Modeling

The overall size of these chance errors is measured by their RMS HALF THE NUMBER OF TOSSES NUMBER OF HEADS MINUS NUMBER OF TOSSES

University of Chicago Graduate School of Business. Business 41000: Business Statistics Solution Key

Beating Roulette? An analysis with probability and statistics.

calculating probabilities

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Chapter 16. Law of averages. Chance. Example 1: rolling two dice Sum of draws. Setting up a. Example 2: American roulette. Summary.

Week 2: Conditional Probability and Bayes formula

Statistics 100A Homework 3 Solutions

Elementary Statistics and Inference. Elementary Statistics and Inference. 17 Expected Value and Standard Error. 22S:025 or 7P:025.

A Few Basics of Probability

Economics 1011a: Intermediate Microeconomics

Term Project: Roulette

Elementary Statistics and Inference. Elementary Statistics and Inference. 16 The Law of Averages (cont.) 22S:025 or 7P:025.

Betting systems: how not to lose your money gambling

Deadly Mistakes That Can Damage Your Injury Claim. A free publication by the Law Offices of James Scott Farrin

Lecture 8 The Subjective Theory of Betting on Theories

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Ch. 13.3: More about Probability

Chapter 16: law of averages

Expected Value. 24 February Expected Value 24 February /19

V. RANDOM VARIABLES, PROBABILITY DISTRIBUTIONS, EXPECTED VALUE

Implications of Big Data for Statistics Instruction 17 Nov 2013

Beyond the Polyester Veil : A Personal Injury Negotiations Case Study

(SEE IF YOU KNOW THE TRUTH ABOUT GAMBLING)

Statistics 100A Homework 4 Solutions

If, under a given assumption, the of a particular observed is extremely. , we conclude that the is probably not

THE WINNING ROULETTE SYSTEM.

Orange High School. Year 7, Mathematics Assignment 2

Statistics and Probability

3.2 Roulette and Markov Chains

Second Midterm Exam (MATH1070 Spring 2012)

Statistics 151 Practice Midterm 1 Mike Kowalski

What are confidence intervals and p-values?

Risk Analysis Overview

3. Data Analysis, Statistics, and Probability

Project Management: Leadership vs. Dictatorship

Probability. a number between 0 and 1 that indicates how likely it is that a specific event or set of events will occur.

GRADE 3: WORKING TOGETHER

Bayesian Analysis for the Social Sciences

How To Run Statistical Tests in Excel

Survey of Nevada Casino Gaming Employees

Expected Value and the Game of Craps

Conditional Probability, Independence and Bayes Theorem Class 3, 18.05, Spring 2014 Jeremy Orloff and Jonathan Bloom

Basic Probability Theory II

The Two Envelopes Problem

How to Beat Online Roulette!

REWARD System For Even Money Bet in Roulette By Izak Matatya

PUBLIC HEALTH OPTOMETRY ECONOMICS. Kevin D. Frick, PhD

Five Business Uses for Snake Oil The #1 Selling Game

How To Get The Most Out Of The Momentum Trader

How To Find The Sample Space Of A Random Experiment In R (Programming)

Review. Bayesianism and Reliability. Today s Class

Getting personal: The future of communications

Types of Error in Surveys

How to make more money in forex trading W. R. Booker & Co. All rights reserved worldwide, forever and ever and ever.

Probability, statistics and football Franka Miriam Bru ckler Paris, 2015.

Pattern matching probabilities and paradoxes A new variation on Penney s coin game

Transcription:

Statistical Fallacies: Lying to Ourselves and Others "There are three kinds of lies: lies, damned lies, and statistics. Benjamin Disraeli +/- Benjamin Disraeli

Introduction Statistics, assuming they ve been done correctly, can t lie But the mathematical component is never the whole story, when statistics interact with the real world a process of deciding which statistics to look at, interpretation of the statistics and representation begins

Typical Errors Fallacies An error in reasoning Bias A partial perspective, either intentional or not, which taints the outlook of the whole Misleading Representation The statistics may be correct, that doesn t mean their representation is

FALLACIES

Failing to Break the Bank at Monte Carlo In a roulette game in Monte Carlo, during 1913, the ball landed in the black 26 times in a row. After this gamblers lost millions disproportionately betting red on the same wheel, in the assumption that red must turn up eventually

The Gambler s Fallacy This fallacy is based on the assumption that the universe somehow evens out, that the number of head thrown will be the same as the number of tails thrown The causation is back to front, the random nature of the event drives the distribution, the likely distribution doesn t drive the events

Correlation is not Causation

The Correlation High gun ownership is correlated with low crime rate The Causes High gun ownership cause low crime A low crime rate leads to high gun ownership Gun ownership and low crime rate cause each other Gun ownership and low crime rates are caused by something else Low crime rates are caused by something which is correlated with gun ownership Random chance

Unpicking the Chain of Causation It is difficult to pick which variable is causing the other Without extra information it may be impossible Easy to check in experiments, alter one variable and observe the effect Less easy in sociological studies where it is impossible to rerun an entire society

Conjunction Fallacy Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable? Linda is a bank teller. Linda is a bank teller and is active in the feminist movement.

Feminist Bank Teller The Conjunction

Framing the Question The previous question can be easily reframed Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. There are 100 people like Linda How many of these people are bank tellers? How many are both feminists and bank tellers?

The Ludic Fallacy The misuse of games to model real life situations The real world is a complicated system, there are many underlying systems and relationships which are not apparent

Why is this the case? Information may be missing, and it may not be known which information (if any) is missing (known unknowns, unknown unknowns ) Small unknown variations could have a large impact (different from chaos theory) Any model or theory can only be based on what has been observed. That which hasn t been observed will probably not be modelled.

An Example A coin is flipped 99 times, coming up heads each time. A scientist, and a confidence trickster are each asked what is the most likely outcome, heads or tails? The scientist, assuming that the coin is perfectly predicted by the model in their head and also being aware of the Gambler s Fallacy, says heads or tails are both equally likely The confidence trickster thinks something is up, and calls heads

Gotham City has a Werewolf Problem In the city of Gotham, population 1 million, there are 100 werewolves Batman can use his bat-wolf-detector TM, which has a false positive error rate of 0.1% Batman detects a werewolf! What happens?

Has Batman caught a werewolf? Naïve Calculation P(wolf)=1-p(error) P(wolf)=99.9% Batman has caught a werewolf! Improved Calculation (Bayes Theorem) P(wolf)=P(detect wolf)p(wolf)/p(detect) =P(detect wolf)p(wolf)/ P(detect wolf)p(wolf)+p(detect not wolf)p(not wolf) =0.999*100/1,000,000/(0.999*100/1,000,000)+ (0.001*999,900/1,000,000) =0.0908 Batman has only a 9% probability for successively catching a werewolf

This is the Base Rate Fallacy This occurs when the probability of a hypothesis H, given that an event E, has happened, is calculated without considering the prior probability of H and the total possibility of E occurring P A B = P B A P(A) P(B) Typically used as an argument against mass screening for terrorists, as the false positive noise will overwhelm any true results

What do these have in common?

A particular section of road has experienced an abnormally high number of accidents A speed camera was introduced and the number of accidents reduced The Madden Curse A high number of players, who have appeared on the front cover of the Madden American Football video game cover, will be injured or suffer career setbacks in the following year

Regression to the Mean: The Regression Fallacy Both of these are related to the same thing; Regression to the Mean In any system with natural fluctuations there will be exceptional measurements A road will experience a high number of accidents, and receive a speed camera. Natural variation then brings the accident rate down An athlete will do exceptionally well, and grace the cover of a video game, before random chance (injuries, failing to repeat previous success) pulls the player back down to earth

The Fallacy List Gambler s Fallacy Cause/Correlation Fallacy Conjunction Fallacy Ludic Fallacy The Base Rate Fallacy The Regression Fallacy Each of these particular types of thinking can blind a person to the underlying information in the numbers

BIAS

The 1936 American Presidential Election In 1936 a postal survey was conducted to predict the next president of the USA. The survey was comprised of readers of the American Literary Digest magazine, with additional responses from registered car and phone owners. The survey predicted Alf Landon, the Republican candidate, would easily win. The actual election was an easy victory for Franklin Roosevelt What Happened?

Sample Bias The people surveyed were not randomly chosen and were not a statistically representative sample of the American Population They were disproportionately rich, when compared to the average voter, and more likely to vote Republican

Cherry Picking A data set is rarely fully complete Sometimes through experimental limitations Sometimes through data being lost or unavailable Sometimes it is deliberately missing If the data set is large enough, and the data loss is not systematic then this shouldn t matter An Example: In 2004 under Bush, the poverty level in America was 12.7%, under Clinton in 1996, the level was at 13.7% From this we can conclude the Poverty reduced under the Bush Administration

Bush Reduced Poverty Did Bush reduce poverty? The poverty level in Bush s era was lower than in Clinton s, but the trend tells a different story

Publication Bias There is a natural tendency to prefer a positive result We want to hear what cures cancer, we don t want a list of medicinal failures This is not necessarily good science. A positive study is no more likely to be methodologically sound than a negative study Despite this, a positive study is more than three times as likely to be published

File Drawer Effect The previous example was publisher bias, but there is also researcher bias A researcher conducts a study with a statistically insignificant result. Rather than attempt to publish, the result is filed away and forgotten A piece of research is lost

But more importantly Out of 20 random trials, one will be deemed statistically significant at the P=0.05 level 20 researchers trial the promising new drug Pheelbettaphasta. 19 researchers show no improvement in the patients at the P=0.05 level. They don t publish 1 researcher detects an improvement. They publish, and a systematic bias has been introduced, unintended, into the research.

How to combat this? Pre-registration of medical trials before results If the trial is pre-registered then it will be impossible to hide negative results, willingly or unwillingly Better and less biased meta-reviews In meta-reviews the instinct is to look for high impact journal publications, by its nature this will overlook the negative results By looking for research that is more methodologically sound, regardless of the result, we get a better indication of the research

Data Dredging: Hunting for Meaning One effect of computers has been the development of Data Science, a new discipline aimed at distilling large data sets into manageable and understandable sections However it has also led to Data Dredging, the practice of sifting through millions of data points and hundreds of variables, searching for any connection

A Toy Example 100 data points are created, each with 50 separate variables This is the equivalent of a survey for 100 people with 50 questions In the example, the survey results are randomly generated, there is no relationship, by design

Results Although most relationship between variables are not significant some are Data dredging would pick up these spurious relationships as real Almost paradoxically, increasing the survey depth increases the chance of false relationships in the data

MISLEADING REPRESENTATIONS

Relative vs Absolute Relative An intervention reduces the probability of an accident by 50% Absolute An intervention reduces the number of accidents by 1 per month Which is better? It depends on the absolute incidence rate for the accident but 50% sounds better! (and is typically what will be reported by newspapers)

Technically correct, deliberately misleading

Technically correct, but unintentionally misleading

Not misleading, but too late

If You Really Want to Obscure Your Data

Pie Charts!

They Can Minimise Differences

Or Unduly Emphasise

Hide Variations Between Data Points

And with further tricks: 1 2 3 4

Force the Result! 1 2 3 4

Thank You