9.63 Laboratory in Cognitive Science Fall 2005 Course 2a- Signal Detection Theory Aude Oliva Ben Balas, Charles Kemp Hypothetical Experiment Question: How LSD drug affects rat s running speed? Method & Subjects: 40 rats have been trained to run a straight-alley maze for a food reward. They are randomly assigned to two groups (Experimental group: LSD injection, and Control group: injection of a placebo). 1
Hypothetical Experiment: Results The distribution of running times is characterized by two measurements: Central tendency (mean) Dispersion: how the score are spread out about the center (standard deviation) Frequency distributions Running time (seconds) Histograms representing scores for 20 participants in the control (top) and experimental conditions of the hypothetical LSD experiment Normal Distribution (curve) Mean Normal distribution 2
Normal Distribution Inflection Point Inflection Point 2.14% 2.14% 0.13% 0.13% 13.59% 34.13% 34.13% 13.59% -4-3 -2-1 0 +1 +2 +3 +4-2.67 Standard (z) Score Units +2.67 Proportions of scores in specific areas under the normal curve. The inflection points are one standard deviation from the mean. Figure by MIT OCW. Each side of the normal curve has a point where the curve slightly reverses its direction: this is the inflection point The inflection point is always one standard-deviation from the mean About 68 % of all scores are contained within one standard deviation of the mean 96 % of the scores are contained within 2 stdev 99.74 % of the scores are contained within 3 stdev This property of normal curves is extremely useful because if we know an individual s score and the mean and standard deviation in the distribution of scores, we also know the person relative rank. % of people with this score IQ Population Distribution 14% 10% 6% 2% Mean Normal distribution 55 70 85 100 115 130 145 IQ Most IQ tests are devised so that the population mean is 100 and the stdev is 15. If a person has an IQ of 115, she has scored higher than XX % of all people. 3
IQ Population Distribution % of people with this score 14% 10% 6% 2% 55 70 85 100 115 130 145 Mentally Retarded IQ Mentally Gifted Normal Curve and Z scores Inflection Point Inflection Point 2.14% 2.14% 0.13% 0.13% 13.59% 34.13% 34.13% 13.59% -4-3 -2-1 0 +1 +2 +3 +4-2.67 Standard (z) Score Units +2.67 Proportions of scores in specific areas under the normal curve. The inflection points are one standard deviation from the mean. Figure by MIT OCW. It is common to compare scores across normal distributions with different means and variances in terms of standard scores or Z-scores The z-score is the difference between an individual score and the mean expressed in units of standard deviations. So, an IQ of 115 transfers in a z-score of X? An IQ of 78 translates to a z-score of Y? Grades in courses should be calculated in terms of z-scores, why? 4
Detection Task Reaction Time distribution Figure removed due to copyright reasons. 874 msec Histogram distribution of all reaction times for 6 participants (4200 samples total) for target present case Mean = 471 msec Stdev = 141 msec 3 stdev from the mean = 874 msec Detection Task: Reaction Time Distribution Figures removed due to copyright reasons. Please see: Bacon-Macé, Nadège, et al. Figures 1 and 3A in "The time course of visual processing: Backward masking and natural scene categorisation." Vision Research 45 (2005): 1459-1469. Masking effects on behavioral reaction time. Reaction time distribution of correct go responses as a function of the SOA (10 msec bins) averaged in 16 subjects. 5
Visual Search: Reaction Time Distribution Figure removed due to copyright reasons. Please see: Wolfe, Jeremy M., and Aude Oliva, et al. Figures 13 and 16 in "Segmentation of objects from backgrounds in visual search tasks." Vision Research 42 (2002): 2985-3004. Signal Detection Theory Starting point of signal detection theory: all reasoning and decision making takes place in the presence of uncertainty Your decision depends on the signal but also your response bias and internal criterion 6
A tumor scenario You are a radiologist examining a CT scan, looking for a tumor. The task is hard, so there is some uncertainty: either there is a tumor (signal present), or there is not (signal absent). Either you see a tumor (response yes ), either you do not see a tumor (response no ). There are 4 possible outcomes: Signal Present Signal Absent Say Yes" Hit False Alarm Say No" Miss Correct Rejection Adapted from David Heeger document Decision making process Two main components: (1) The signal or information. You look at the information in the CT scan. A tumor might be brighter or darker, have a different texture, etc. With expertise and additional information (other scans), the likelihood of getting a HIT or CORRECT REJECTION increase. Adapted from David Heeger document 7
Decision making process Two main components: (2) Criterion: the second component of the decision process is very different: it refers to your own judgment or internal criterion. For instance, for two doctors: Criterion life and death (and money): Increase in False Alarm = decision towards yes (tumor present) decision. A false alarm will result in a routine biopsy operation. This doctor has a bias toward yes : liberal response strategy. Criterion unnecessary surgery : surgeries are very bad (expensive, stress). They will miss more tumors and save money to the social system. They will feel that a tumor if there is really one will be picked-up at the next check-up. This doctor has a bias towards no : a conservative response strategy. Adapted from David Heeger document Internal Response and Internal noise Content removed due to copyright reasons. Refer to: Heeger, David. Signal Detection Theory. Department of Psychology, New York University, 2003. 8
Probability of Occurrence Curves Content removed due to copyright reasons. Refer to: Heeger, David. Signal Detection Theory. Department of Psychology, New York University, 2003. Probability of Occurrence Curves Content removed due to copyright reasons. Refer to: Heeger, David. Signal Detection Theory. Department of Psychology, New York University, 2003. 9
Hypothetical internal response curves SDT assumes that your internal response will vary randomly over trials around an average value, producing a normal curve distribution of internal responses. The decision process compare the strength of the internal (sensory) response to an internally set criterion: whenever the internal response is greater than this criterion, response yes. Whenever the internal response is less than the criterion, response no. Figure removed due to copyright reasons. Refer to: Heeger, David. Figure 2 in Signal Detection Theory. Department of Psychology, New York University, 2003. The decision process is influenced by knowledge of the probability of signal events (cf. Wolfe et al., Nature, 2005) and payoff factors. Criterion line divides the graph into 4 sections (hits, misses, false alarm, correct rejections). On both HIT and FA, the internal criterion is greater than the criterion. Adapted from David Heeger document Effects of Criterion If you choose a low criterion, you respond yes to almost everything (never miss a tumor and have a very high HIT rate, but a lot of unnecessary surgeries). If you choose a high criterion, you respond no to almost everything. Figure removed due to copyright reasons. Refer to: Heeger, David. Figure 2 in Signal Detection Theory. Department of Psychology, New York University, 2003. from David Heeger document 10
SDT and d-prime The underlying model of SDT consists of two normal distributions one representing a signal (target present) and another representing "noise." (target absent) The willingness of the person to say 'Signal Present' in response to an ambiguous stimulus is represented by the criterion. How well a person can discriminate between Signal Present and Signal Absent trials is represented by the difference between the means of the two distributions, d'. http://wise.cgu.edu/sdt/models_sdt1.html Criterion Data from an experiment with target present/absent: Where do I start? Everything is in the HIT and FA rates. For instance: HIT = 0.84 FA = 0.16 Response bias c = -0.5[z(H)+z(F)] Sensitivity (discrimination) d = z(h)-z(f) 11
D-prime: d' = separation / spread d = z (hit rate) - z (false alarm) FA = 0.16 Proportion of yes responses given the target is absent HIT = 0.84 Proportion of yes responses given the target is present Target absent Target Present Only need of HIT and FA HIT = 0.84 = Area to the right of C is 84 % absent present Criterion FA = 0.16 Area to the right of C is 16 % Response bias c = -0.5[z(H)+z(F)] C = - 0.5 [z(0.84) + z(0.16)] C = - 0.5 [-1 + 1] C = 0 (no bias) Sensitivity d = z(h)-z(f) d = z(0.84) - z(0.16) d = -1-1 d = 2 In Excel: z(x) is NORMINV(x,0,1) e.g. NORMINV(0.84,0,1) 12
D-prime: d' = separation / spread d = z (hit rate) - z (false alarm) FA =?? Proportion of yes responses Given the target is absent HIT =?? Proportion of yes responses given the target is present Target absent Target Present Higher Criterion D-prime: d' = separation / spread d = z (hit rate) - z (false alarm) FA = (very small) HIT = 0.5 Proportion of yes responses Proportion of yes responses Given the target is absent given the target is present Target absent Target Present Higher Criterion 13
Area under normal curve and z-score 1 stdev 84 % (area to the left) Z-scores are measured in standard deviations from the mean. Area to the right of a z-score is the probability that a draw from the normal distribution will be above the z-score Slide adapted from Ben Backus, Uni. Pennsylvania Conclusion on d d' is a measure of sensitivity. The larger the d' value, the better your performance. A d' value of zero means that you cannot distinguish trials with the target from trials without the target. A d' of 4.6 indicates a nearly perfect ability to distinguish between trials that included the target and trials that did not include the target. C is a measure of response bias. A value greater than 0 indicates a conservative bias (a tendency to say `absent' more than `present') and a value less than 0 indicates a liberal bias. Values close to 0 indicate neutral bias. 14
CogLab 1: Signal Detection Results d' is a measure of sensitivity. A d' value of zero means that?????? A d' of 4.6 indicates a nearly perfect ability to distinguish between trials that included the target and trials that did not include the target. C is a measure of response bias. A value greater than 0 indicates????? a value less than 0 indicates????? Values close to 0 indicate????? 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 Hit False Alarm N=6 / SDT, Perception and Memory "Yes-No" paradigms A research domain where SDT has been successfully applied is in the study of memory. Typically in memory experiments, partici pants are shown a list of words and later asked to make a "yes" or "no" statement as to whether they remember seeing an item before. Alternatively, participants make "old" or "new" responses. The results of the experiment can be portrayed in what is called a decision matrix. The hit rate is defined as the proportion of "old" responses given for items that are Old and the false alarm rate is the proportion of "old" responses given to items that are New. Hypothetical distribution of yes and no response. The decision criterion C determines whether a yes or no response will be made. Strong evidence to the right of the criterion will lead to yes responses and weak evidence to the left will lead to weak responses. Interactive program: the WISE Project's Signal Theory Tutorial 15
Animal Detection Task (Project) Images removed due to copyright reasons. Target: Image removed due to copyright reasons. Distractor: Image removed due to copyright reasons. Method: experiment ready to run in matlab False Memory (Project) Scene memory and visual complexity (clutter) Images removed due to copyright reasons. Method: pictures already ranked for complexity. Experiment can be done with powerpoint. Also: Memory of comics drawing, memory of places (e.g. for eyewitness testimony) memory of emotional images, memory under dual-tasks, short term memory (change blindness paradigm, etc), memory of emotions-faces, etc.. 16
Costs and Utilities of d What are the costs of a false alarm and of a miss for the following: A pilot emerges from the fog and estimates whether her position is suitable for landing A doctor estimates whether a fuzzy spot could be a tumor You are screening bags at the airport Wolfe et al (2005) Detection of Rare Target - Visual Search (project) Refer to: Wolfe, Jeremy M., Todd. S. Horowitz, and Naomi M. Kenner. Rare items often missed in visual searches. Nature 435 (2005): 439-440. Our society relies on accurate performance in visual screening tasks. These are visual search for rare targets: we show here that target rarity leads to disturbingly Inaccurate performance in target detection 17
What happened? Find the tool When tools are present on 50% of trials, observers missed 5-10% of them When the same tools are present on just 1% of trials, observers missed 30-40% of them A problem with performance, not searcher competence. Courtesy of Dr. Jeremy Wolfe. Used with permission. l Here, the important errors are Misses Figure removed due to copyright reasons. Please see: Wolfe, Jeremy M., Todd. S. Horowitz, and Naomi M. Kenner. Figure 1 in Rare items often missed in visual searches. Nature 435 (2005): 439-440. 18
The Gambler s Fallacy 2000 Reaction Time (msec) 1000-8 -4 0 4 8 Trial relative to Target Present Trial Right after a MISS, RTs jump way up l Courtesy of Dr. Jeremy Wolfe. Used with permission. But I am sure another rare target won t come again soon right? 2000 Reaction Time (msec) 1000-8 -4 0 4 8 Trial relative to Target Present Trial So I don t learn from my error Courtesy of Dr. Jeremy Wolfe. Used with permission. l 19
And I do the same after a Hit. Reaction Time (msec) -8-4 0 4 8 Trial relative to Target Present Trial Courtesy of Dr. Jeremy Wolfe. Used with permission. The Gambler s Fallacy Reaction Time (msec) -8-4 0 4 8 Trial relative to Target Present Trial Courtesy of Dr. Jeremy Wolfe. Used with permission. S 20
The z-score z i = (x i x) How many standard deviations above the s mean is score z i? x = 18, 24, 12, 6 x = 1.5, 2, 1, 0.5 z =.4, 1.3, -.4,-1.3 z =.4, 1.3, -.4, - 1.3 Courtesy of Ruth Rosenholtz. Used with permission. Example use of z-scores How do you combine scores from different people? Mary, Jeff, and Raul see a movie. Their ratings of the movie, on a scale from 1 to 10: Mary = 7, Jeff = 9, Raul = 5 Average score = 7? It s more meaningful to see how those scores compare to how they typically rate movies. Courtesy of Ruth Rosenholtz. Used with permission. 21
Example use of z-scores Recent ratings from Mary, Jeff, & Raul: Mary: 7, 8, 7, 9, 8, 9 7 is pretty low Jeff: 8, 9, 9, 10, 8, 10 9 is average Raul: 1, 2, 2, 4, 5 5 is pretty good z-scores: Mary: m=8, s=.8 z(7) = -1.2 Jeff: m=9 z(9) = 0 Raul: m=2.8, s=1.5 z(5) = 1.5 mean(z) = 0.1 = # of standard deviations above the mean It s probably your average movie, nothing outstanding. Courtesy of Ruth Rosenholtz. Used with permission. 22