RANDOM-NUMBER GUESSING AND THE FIRST DIGIT PHENOMENON1



Similar documents
The Difficulty of Faking Data Theodore P. Hill

THE FIRST-DIGIT PHENOMENON

Lab 11. Simulations. The Concept

Discrete Mathematics and Probability Theory Fall 2009 Satish Rao, David Tse Note 10

Minimax Strategies. Minimax Strategies. Zero Sum Games. Why Zero Sum Games? An Example. An Example

People have thought about, and defined, probability in different ways. important to note the consequences of the definition:

A Statistical Analysis of Popular Lottery Winning Strategies

Benford s Law and Fraud Detection, or: Why the IRS Should Care About Number Theory!

Sampling and Sampling Distributions

Lottery Combinatorics

Benford s Law: Tables of Logarithms, Tax Cheats, and The Leading Digit Phenomenon

Statistics and Random Variables. Math 425 Introduction to Probability Lecture 14. Finite valued Random Variables. Expectation defined

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Fairfield Public Schools

Introduction to Hypothesis Testing OPRE 6301

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools

PROBABILITY SECOND EDITION

We can express this in decimal notation (in contrast to the underline notation we have been using) as follows: b + 90c = c + 10b

What is the purpose of this document? What is in the document? How do I send Feedback?

Cheating behavior and the Benford s Law Professor Dominique Geyer, ( dgeyer@audencia.com) Nantes Graduate School of Management, France

Prediction Markets, Fair Games and Martingales

Mathematical goals. Starting points. Materials required. Time needed

AP Stats - Probability Review

Session 8 Probability

Chapter 6: Probability

AMS 5 CHANCE VARIABILITY

The Binomial Distribution

Problem of the Month: Fair Games

Prospect Theory Ayelet Gneezy & Nicholas Epley

Lies My Calculator and Computer Told Me

Unit 19: Probability Models

MTH6120 Further Topics in Mathematical Finance Lesson 2


Testing Research and Statistical Hypotheses

Sample Size and Power in Clinical Trials

MATH 140 Lab 4: Probability and the Standard Normal Distribution

COMMON CORE STATE STANDARDS FOR

Paninimania: sticker rarity and cost-effective strategy

6.042/18.062J Mathematics for Computer Science. Expected Value I

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

MONT 107N Understanding Randomness Solutions For Final Examination May 11, 2010

Math Games For Skills and Concepts


Solutions to Homework 10 Statistics 302 Professor Larget

About the inverse football pool problem for 9 games 1

arxiv: v1 [math.pr] 5 Dec 2011

Chapter 3 RANDOM VARIATE GENERATION

Book Review of Rosenhouse, The Monty Hall Problem. Leslie Burkholder 1

3 Some Integer Functions

Inaugural Lecture. Jan Vecer

C. The null hypothesis is not rejected when the alternative hypothesis is true. A. population parameters.

If, under a given assumption, the of a particular observed is extremely. , we conclude that the is probably not

Study Guide for the Final Exam

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

Lotto Master Formula (v1.3) The Formula Used By Lottery Winners

Lecture 13. Understanding Probability and Long-Term Expectations

Concepts of Probability

SAMPLING & INFERENTIAL STATISTICS. Sampling is necessary to make inferences about a population.

Bayesian Tutorial (Sheet Updated 20 March)

Statistical Impact of Slip Simulator Training at Los Alamos National Laboratory

PART 3 MODULE 3 CLASSICAL PROBABILITY, STATISTICAL PROBABILITY, ODDS

NORTH CAROLINA EDUCATION LOTTERY POLICIES AND PROCEDURES MANUAL CHAPTER 2 GAME RULES 2.04B CAROLINA PICK 3 GAME RULES

Game Theory and Poker

Research Methods & Experimental Design

Modern Science vs. Ancient Philosophy. Daniel Gilbert s theory of happiness as presented in his book, Stumbling on Happiness,

STA 371G: Statistics and Modeling

Lecture 25: Money Management Steven Skiena. skiena

Our Primitive Roots. Chris Lyons

Managing Innovation. A guide to help you adopt a more structured approach to managing innovation in your business

The Taxman Game. Robert K. Moniot September 5, 2003

Descriptive Statistics and Measurement Scales

USING LOTTERIES IN TEACHING A CHANCE COURSE. Written by the Chance Team for the Chance Teachers Guide revised August 1, 1998

Financial Mathematics and Simulation MATH Spring 2011 Homework 2

HOW TO DO A SCIENCE PROJECT Step-by-Step Suggestions and Help for Elementary Students, Teachers, and Parents Brevard Public Schools

Conditional Probability, Hypothesis Testing, and the Monty Hall Problem

Video Poker in South Carolina: A Mathematical Study

MATHEMATICS BONUS FILES for faculty and students

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LOGISTIC REGRESSION ANALYSIS

This chapter discusses some of the basic concepts in inferential statistics.

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Hoover High School Math League. Counting and Probability

Hands-On Math Algebra

THE LAW OF THE THIRD

A Power Primer. Jacob Cohen New York University ABSTRACT

A New Interpretation of Information Rate

8 Divisibility and prime numbers

An Automated Test for Telepathy in Connection with s

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Section 6.2 Definition of Probability

Sums of Independent Random Variables

(SEE IF YOU KNOW THE TRUTH ABOUT GAMBLING)

We employed reinforcement learning, with a goal of maximizing the expected value. Our bot learns to play better by repeated training against itself.

Factorizations: Searching for Factor Strings

Match Learner. Learning by Playing

Transcription:

Psychological Reports, 1988, 62,967-971. Psychological Reports 1788 RANDOM-NUMBER GUESSING AND THE FIRST DIGIT PHENOMENON1 THEODORE P. HILL' Georgia Irrrlirute of Technology Summmy.-To what extent do individuals "absorb" the empirical regularities of their environment and reflect them in behavior? A widely-accepted empirical observation called the First Digit Phenomenon or Benford's Law says that in collections of miscellaneous tables of data (such as physical constants, almanacs, newspaper arrides, etc.), the first significant digit is much more likely to be a low number than a high number. In this study, an analysis of the frequencies of the first and second digits of "random" six-digit numbers guessed by people suggests that people's responses share some of the properties of Benford's Law: first digit 1 occurs much more frequently than expected; first digit 8 or 7 occurs much less frequently; and the second digits are much more uniformly distributed than the first. It seems to be widely accepted that people CaMOt behave truly randomly, even in situations such as game-playing where success may depend on an ability to perform in an unpredictable way. If people are asked to generate random numbers, their responses differ significantly from truly random sequences; Tune (1964) has a good review of the literature. For example, an experiment of T. Varga [see Revesz (1978) or a related experiment by Bakan (1960)] is this. Half a dass of students is asked to flip a coin 200 times and record the results; the other half is asked to fake a sequence of 200 tosses. By later declaring "fake" any sequence which fails to have a run of at least length six, Varga found he could distinguish between the true and the faked sequences with very high accuracy. Of course, once this rule is known the fakers can beat it, and in general it seems possible to be able to learn to generate more random sequences (cf. Neuringer, 1986). As a second example, a statistical analysis of the winners of the Massachusetts Numbers Game by Chernoff (1981) yielded a collection of 33 fourdigit numbers which were sufficiently unlikely to be picked by players as to make them potentially favorable bets. In that game, first the players bet on a four-digit number of their own choice, next a single four-digit number is drawn by a judge (or generated randomly), and then all players with the winning number share the pot equally. In such a situation it is advantageous ro be able 'Research partially supported by NSF Grants DMS84-01604, 86-01608, and 87-01691. Request re~rints of T. P. Hill. School of Mathematics, Georgia - Instirute of Technolom, -. - ~tlanta, ~k 30332. The author is grateful to the referees and to Professors J. Dugas, A. Neuringer, J. Marr, C. Spruill. Y. Tong, and especially A. Tversky for a number of conversations and sug-

968 T. P. HILL to identify numbers which few people choose, since all numbers are equally likely to be winners and the expected payoff for an unpopular number is then higher than that for a number which many people have chosen. (Of course, "unpopular" numbers become popular numbers as soon as they are learned.) Perhaps when people think they are generating a random number they are often basing it on numbers from their own experience, and if the numbers in their experience are not uniformly distributed (or "truly random"), this should be reflected in the responses. A widely quoted empirical observation, apparently first published by the physicist Benford (1938), is that randomly occurring tables of data tend to have entries that begin with low numbers: in particular, the first significant digit is 1 almost three times as much as one would expect and is 9 less than half as much as one would expect. The purpose of this research was to explore the possible connection between this First Digit Phenomenon and the responses of people trying to generate random numbers. If people are influenced by experience with numbers roughly obeying Benford's observations, one would expect their responses to have many more numbers beginning with 1 or 2 than with 8 or 9; this was the case, although not nearly as extremely as in Benford's data. First Digit Phelzomenon Benford observed that tables of logarithms in libraries tend to be progressively dirtier near the beginning and wondered why students of science and engineering (say) have more occasion to calculate with numbers beginning with 1 or 2 than with 8 or 9. He studied 20 tables of numbers including molecular weights of chemicals, surface areas of rivers, street addresses of famous people, baseball statistics, arithmetical sequences, and newspaper items and found that the leading significant (nonzero) digit in his data was 1 with frequency 306, as opposed co the uniform frequency 1/9 one would expect for a "truly random" distribution of the first significant digits. In general, Benford found empirically that the proportion of entries beginning with first (nonzero) digit d is well approximated by There have subsequently been many theoretical models offered to explain [I], including those by Cohen ( 1976), Diaconis ( 1977), Flehinger ( 1966), and Raimi ( 1969, 1976) ; Raimi ( 1976) has a good survey of the literature on this problem. Similarly, Benford found that the frequencies of digits k places from the left are also nonuniform, although they do tend to be more uniformly distributed as k increases. The proportion of entries with second significant digit d is approximately

RANDOM-NUMBER GUESSING, FIRST DIGIT 767 9 9 loglo I1 (loj + d + 1) - log10 II (loj + d), d = 0,1,..., 9. j= 1 j= 1 [21 For example, the second significant digit is 1 with probabiliry loglo (12 ' 22 ' ' ' 92) - loglo(ll ' 21 ' ' ' 91) e.114; Table 1 below includes the relative frequency of the first and second digits according to the laws [I] and [2]. Method A large number (742) of undergraduate calculus students were given slips of paper and asked to write down a six-digit random number "out of their heads," i.e., no calculators, coin-flipping, etc. The frequencies of the first and second significant digits in their responses were calculated and are summarized in the following table, along with the theoretical frequencies of the first and second digit laws [l] and [2] (which are referred to as Benford frequencies in the table), and the "truly random" or uniform distributions. (Two responses of all zeroes were not included in the table.) The choice of number of digits requested (six) was made at A. Tversky's suggestion. Many natural variations of the experiment are possible, among them: requesting a different, or unspecified number of digits; allowing more time, say several days, for responses; including rewards of some type, such as a lottery jackpot as in the numbers game studied by Chernoff or a reward structure of Neuringer (1986); using subjects from different classes of mathematical sophistication [Chapanis (1953) found that more marhematically sophisticated subjects responded with more nearly uniform (random) sequences]; and allowing the subjects to call out, use computer keyboard, or otherwise register their responses. Results Analysis of the data in Table 1 was made using the chi-squared and Kolrnogarov-Smirnov goodness-of-fit tests. Other tests such as measure of information content or aucocorrelation are also possible; the reader is referred to TABLE 1 FREQUENCIES OF FIRST AND SECOND SIGNIFICANCE DIGITS First Observed Benford ( 1 ) Uniform Second Observed Benford (2) Uniform Digit Digit

970 T. P. HILL Neuringer (1986) for references. As one would expect by looking at Table 1 (left), the null hypothesis that the response frequencies obey Benford's Law is rejected at the.05 significance level by both the chi-squared (8 df) test and by the Kolmogarov-Smirnov test. Table 1 (right) shows better agreement; the chi-squared (9 df) test permits acceptance of the null hypothesis that the true distribution is Benford at the.05 significance level whereas the Kolrnogarov- Smirnov test requires rejection; both tests permit acceptance of the null hypothesis that the true distribution is uniform at the.05 level. Discassion Although not conforming precisely to the predictions of the First Digit Law, the results of the experiment indicate that the distributions of random numbers guessed by people share the following properties with the Benford distributions: (i) the frequency of numbers with first significant digit 1 is much higher than expected; (ii) the frequency of numbers with first significant digit 8 or 9 is much lower than expected; and (iii) the distribution of the second digits is much more nearly uniform than the distribution of the first digits. These conclusions are consistent with Chernoff's (1981) findings that generally high numbers are less likely to be chosen in numbers games. Of the 33 numbers in his "first system" (numbers with predicted normalized payoffs exceeding LO), 16 had first significant digit 8 or 9, and only one has first significant digit I or 2. (Recall that Chernoff tried to identify numbers which were cmlikely, so a high proportion of large numbers in his list corresponds to a low frequency of occurrence of high numbers.) These conclusions are also undoubtedly related to other known common response phenomena such as the tendency, in interest surveys, to select the first choice early in the list. Accordingly, these results seem quite germane for psychometricians involved in the design of tests. Several other questions are raised by this experiment. For example, Table 1 (left) suggests that numbers with leading significant digit 5 are much less likely to occur than those with 4 or 6, at least by calculus students. Is this true in general, and if so, why? Conclusion (iii) above suggests that people do not generate random numbers digit-by-digit but rather according to some other rule. Is this true in general? The data in Table 1 also suggests a tendency towards bimodality, with local minima at 5 or 6. Does this perhaps reflect two separate subpopulations with one choosing numbers early in the series and one choosing from the high end? REFERENCES BAKAN, P. (1960) Response-tendencies in attempts to generate binarg series. 1. Psychol., 73, 127-131. Amer.

RANDOM-NUMBER GUESSING, FIRST DIGIT 97 1 BENFORD, F. (1938) 551-572. The law of anomalous numbers. Proc. Amer. Philos. Soc., 78, CHAPANIS, A. (1953) Random-number guessing behavior. Amer. Psychologist, 8, 332. CHERNOFF, H. (1981) 3, 166-172. How to beat the Massachusetts Numbers Game. M~dh. Intel., COHEN, D. (1976) An explanation of the first digit phenomenon. Th., (A) 20, 367-370. J. Combhod DIACONIS, P. (1977) The distribution of leading digits and uniform distribution mod 1. Am. Probab., 5, 72-81. FLEHINGER, B. (1966) On the probability that a random integer has initial digit A. Amer. Math. Mo., 73, 1056-1061. NEUNNGRR, A. (1986) Can peo le behave "randomly"?: the role of feedback. 1. exp. Prychol. (Gen.), 115, 82-75. RAM, R. (1969) The peculiar distribution of first significant digics. 221 (G), 109-120. Sci. Amer., RAM, R. (1976) On the distribution of first significant digits. 76, 342-348. Amer. Math. Mo., REVESZ, P. (1978) Strong theorems on coin-tossing. Proc. Internat. Congr. of Mathematicians, Helsinki, 749-754. TUNE, G. (1964) Response preferences: a review of some relevant literature. Psychol. Bull., 61, 286-302. Accepted April 29, 1988.