Reliable and Valid Measures of Threat Detection Performance in X-ray Screening
|
|
|
- Tobias Wood
- 9 years ago
- Views:
Transcription
1 IEEE ICCST 4, Proceedings Reliable and Valid Measures of Threat Detection Performance in X-ray Screening Franziska Hofer, Adrian Schwaninger Department of Psychology, University of Zurich, Switzerland Abstract Over the last decades, airport security technology has evolved remarkably. This is especially evident when state-of-the-art detection systems are concerned. However, such systems are only as effective as the personnel who operate them. Reliable and valid measures of screener detection performance are important for risk analysis, screener certification and competency assessment, as well as for measuring quality performance and effectiveness of training systems. In many of these applications the hit rate is used in order to measure detection performance. However, measures based on signal detection theory have gained popularity in recent years, for example in the analysis of data from threat image projection (TIP) or computer based training (CBT) systems. In this study, computer-based tests were used to measure detection performance for improvised explosive devices (IEDs). These tests were conducted before and after training with an individually adaptive CBT system. The following measures were calculated: phit, d, m, Az, A, p(c) max. All measures correlated well, but ROC curve analysis suggests that nonparametric measures are more valid to measure detection performance for IEDs. More specifically, we found systematic deviations in the ROC curves that are consistent with two-state low threshold theory of [9]. These results have to be further studied and the question rises if similar results could be obtained for other X-ray screening data. In any case, it is recommended to use A in addition to d in practical applications such as certification, threat image projection and CBT rather than the hit rate alone. Index Terms human factors in aviation security, hit rate, signal detection theory, threat detection in X-ray screening, computer based training system, threat image projection. This research was financially supported by Zurich Airport Unique, Switzerland. Franziska Hofer, University of Zurich, Department of Psychology ( [email protected]). Adrian Schwaninger, University of Zurich, Department of Psychology ( [email protected]). I. INTRODUCTION Technological progress enabled state-of-the-art X-ray screening to become quite sophisticated. Current systems provide high image resolution and several image enhancement features (e.g. zoom, filter functions such as negative image, edge detection etc.). But technology is only as effective as the humans that operate it. This has been realized more and more in recent years and the relevance of human factors research has increased substantially. Note that during rush hour, aviation security screeners often have only a few seconds to inspect X-ray images of passenger bags and to judge whether a bag contains a forbidden object (NOT OK) or whether it is OK. Threat object recognition does largely depend on perceptual experience and training []. Most object recognition models agree with the view that the recognition process involves matching an internal representation of the stimulus to a stored representation in visual memory (for an overview see [] and []). If a certain type of forbidden item has never been seen before, there exists no representation in visual memory and the object becomes very difficult to recognize if it is not similar to stored views of another object. Besides the aspect of memory representation, image-based factors can also affect recognition substantially (for a detailed discussion see [4]). When objects are rotated, they become more difficult to recognize (effect of view). In addition, objects can be superimposed by other objects, which can impair detection performance (effect of superposition). Moreover, the number and type of other objects in the bag challenge visual processing capacity, which can affect detection performance as well (effect of bag complexity). While CBT can increase detection performance substantially [], recent results suggest that training effects are smaller for increasing the ability to cope with image-based factors such as effects of view, superposition and bag complexity [4]. However, this conclusion relies on the availability of reliable and valid measures of detection performance. For example the hit rate is not a valid measure for estimating the detection performance in a computer-based test in which screeners are exposed to X-ray images of passenger bags and have to take OK / NOT OK decisions. The reason is simple: A candidate could achieve a high hit rate by simply judging most bags as NOT OK. In order to distinguish between a liberal response bias and true detection ability, the false alarm rate needs to be taken into account as well. This is certainly part of the reason why signal detection theory (SDT)
2 IEEE ICCST 4, Proceedings has been used for analyzing X-ray screening data (see for example [5] and []). In general, reliable and valid measures of detection performance are certainly very important for risk analysis, screener certification and competency assessment, as well as for measuring quality performance and effectiveness of training systems. The objective of this study was to compare different measures of screener detection performance. As clearly shown by [], detection performance depends substantially on perceptual experience and training at least for certain types of threat items. In order to evaluate different performance measures we used computer-based tests before and after CBT. Using a baseline test and a test after the training period makes it possible to compare different detection measures with regard to their reliability and validity while taking effects of training into account. The following detection measures were compared in this study: phit, d, m, Az, A, p(c) max. These measures and the corresponding detection models are summarized in the following section. II. DETECTION MODELS AND PERFORMANCE MEASURES Signals are always detected against a background of activity, also called noise. Thus, detecting threat objects in passenger bags could be described as typical detection task, where the signal is the threat object and the bag containing different harmless objects constitutes the noise. A correctly identified threat object corresponds to a hit, whereas a bag, which contains no threat item judged as being harmless, represents a correct rejection. Judging a harmless bag as being dangerous is a false alarm, whereas missing a forbidden object in a bag represents a miss. In every detection situation, the observer must first make an observation and then make a decision about this observation. Signal detection theory [6], [7], and threshold theories [8], [9] are two fundamentally different approaches of conceptualizing human perception. The main difference between the two approaches is that threshold theories suppose a theoretical threshold, whereas in SDT the concept of a threshold is rejected in favor of an adjustable decision criterion. Threshold theories can be coarsely divided into high (e.g. single high-threshold theory or double high-threshold theory) and low threshold theories. These approaches assert that the decision space is characterized by a few discrete states, rather than the continuous dimensions of SDT. Single high threshold theory predicts ROC curves which are often not consistent with experimental data [], [9]. The other types of threshold theories are low threshold theories originally described by [] and []. The two-state low threshold theory is a slightly newer version by [9]. This theory is often as consistent with the data as are SDT models. But because no single sensitivity measure exists for this theory, it is not widely applied [7]. According to SDT, the subject's decision is guided by information derived from the stimulus and the relative placement of a response or decision criterion. An anxious person would set the criterion very low, so that a very small observation would lead to a signal response. In contrast, another person might set the criterion very high, so that the sensory observation needs to be very strong that this person would give a signal present answer. It is important to note that different persons can have different criterion locations, and it is also possible that the same person changes the location of the criterion over time. For example the day after 9/, many airport security screeners have moved their criterion in the direction that at the smallest uncertainty, they judged passenger bags as being dangerous. Although the hit rate increases, detection performance stays more or less stable, because also the false alarm rate increases. In contrast to signal detection theory, threshold theories suppose that not the locus of the criterion causes the answer but a theoretical threshold. In the two-state low threshold theory by [9], the threshold is assumed to exist somewhere between the middle and the upper end of the noise distribution. During a sensory observation, an observer is in the detect state if the observation exceeds threshold and in the nondetect state if the observation is below threshold. The response one makes in either state may be biased by nonsensory factors. A person can say yes when in the nondetect state or say no when in the detect state. Manipulating variables such as payoff and signal probability changes the observers response bias when they are in either one of the two possible detection states. The main disadvantage of this low threshold theory is its lack of a single sensitivity measure that can be calculated from hits and false alarms. Different signal detection measures and sensitivity measures coming from threshold theories exist. One of the most popular and very often used parametric SDT measures is d. It is calculated by subtracting the standardized false alarm rate from the standardized hit rate. A detection performance of d = means that the screener had exactly the same hit and false alarm rate in other words that this screener just guessed. This measure may only be calculated under the assumption that the theoretical signal-plus-noise distribution and the noise distribution are ) normally distributed (binormal ROC curves) and ) that their variances are equal. These assumptions can be tested with receiver-operating characteristic (ROC) curves, where the proportion of hits is plotted as a function of the proportion of false alarms at different locations of the criterion. Maximum likelihood (ML) estimation algorithms for fitting binormal ROC curves are available (see []-[5]). The second assumption can be tested with the slope of the standardized ROC curve (if the variances are equal the slope of the standardized ROC curve is ). If the variances are unequal, another signal detection measure, m, is often used. One disadvantage of this measure is that it can only be computed when ROC curves are available. D and m express sensitivity in terms of the difference between the means of the noise and signal-plus-noise distribution expressed in units of the noise distribution. If the ROC curves are not binormal, it is still possible to express sensitivity as the 4
3 IEEE ICCST 4, Proceedings area under the ROC curve. Another well known measure, which is nonparametric (or sometimes also called distribution-free ) is A and was first proposed by [6]. The term nonparametric refers to the fact that the computation of A requires no a priori assumption about underlying distributions. A can be calculated when ROC curves are not available and the validity of the normal distribution and equal variance assumptions of the signalnoise and noise distribution can not be verified. A can be calculated by the following formula [7]: A =.5 + [(H - F)( + H - F)]/[4H( - F)], whereas H is the hit rate and F the false alarm rate. If the false alarm rate is greater than the hit rate the equation must be modified [8], [9]: A =.5 [(F - H)( + F - H)]/[4F( - H)] As [] have pointed out, this does not mean that these measures are an accurate reflection of their theoretical origin (i.e. that A reflects the area under a reasonable ROC curve) or that A is a distribution-free measure or fully independent of a response bias (see also []). Thus, the term nonparametric is somewhat misleading. A further disadvantage of A is that it underestimates detection ability by an amount that is a function of the magnitude of bias and decision ability []. But because A can be easily computed and no further assumptions on the underlying noise and signal-plus-noise distribution have to be made, researchers often use this measure when the assumptions of SDT are not Hit Rate [p(yes/sn)] False Alarm Rate [p(yes/n)] Fig. ROC implied by signal detection theory (solid curve) and by two-state low threshold theory from [9] (dashed lines). fulfilled or cannot be tested. Another measure, which is sometimes used, is the unbiased proportion correct p(c) max. This measure can be calculated from d and used instead of A (see [7], []). As d, it is independent of any response biases. Whenever using p(c) max, double high-threshold theory is implied (for detailed information on double-high threshold theory see []; summarized in [6]). To investigate which performance measures for threat detection in X-ray images are valid and reliable for novices as well as for experts it is important to examine the form of ROC-curves prior and after training. Note that signal detection and threshold theories predict different forms of ROC curves. In linear coordinates the ROC curve predicted from the twostate low threshold theory of [9] are two straight lines, whereas the ROC curve predicted from signal detection theory is a symmetrical curve (see Figure ). III. METHOD We used a computer-based training (CBT) system for improvised explosive devices (IEDs), which was developed based on object recognition theories and visual cognition. For detailed information on this CBT system (X-Ray Tutor) see [], [] and [4]. A. Participants The original sample size of participants was seventy-two (fifty females) at the age of.9 6. years (M = 48. years, SD = 9. years). Data of ten participants were not included in the analyses of this study because at least for one test date the slope of the standardized ROC curve was between -. and.. Thus, for the analyses in this study data of sixty-two participants were used. None of the participants had received CBT before. B. Training design The detailed design of the data collection, design and material can be found in an evaluation study of the CBT published recently []. In summary, four groups of participants had access to the CBT from December to May. There were four training blocks counterbalanced across four groups of trainees using a Latin Square design. Prior to each training block, performance tests were taken containing the IEDs of the following training block. This method allowed to measure training effectiveness for IEDs never seen before in a standardized way. Training and testing blocks consisted of sixteen IEDs. For the training, sixteen difficulty levels were constructed by combining each IED with bags of different complexities. At test, only the two most difficult combinations of IEDs and bags were used. To test training effects for different display durations, each bag was presented for 4 and 8 seconds. All four tests consisted of 8 trials: 6 (IEDs) * (two most difficult levels) * (4 & 8 sec display durations) * (harmless vs. dangerous bags). The order of trial presentation was randomized. For each trial, participants had to judge whether the X-ray image of the bag contained an IED (NOT OK response) or not (OK response). In addition, confidence ratings from (very easy) to (very difficult) were assessed after each decision. 5
4 IEEE ICCST 4, Proceedings For the purposes of this study we analyzed data of the first detection performance test conducted in Dec/Jan (Test, prior training) and data of the third detection performance test conducted in March/April (Test, after training sessions on average). measures as reliable and valid estimates of threat detection performance might be questioned at least as far as detection of IEDs in X-ray images of passenger bags is concerned. Test Dec/Jan IV. RESULTS In order to plot ROC curves, confidence ratings were divided into categories ranging from (bag contains no IED for sure) to (bag contains an IED for sure). Figure shows the pooled unstandardized ROC curves prior training and after training sessions for display durations of four (a, c) and eight seconds (b, d). As can be seen in Figure, the ROC curves seem to be better fitted by two straight lines than by a bimodal ROC curve as would be predicted from SDT. ZSN a b Test Mar/April. Test Dec/Jan. ZSN Hit Rate [p(yes/sn)] Hit Rate [p(yes/sn)] a b Test Mar/April c False Alarm Rate [p(yes/n)] d False Alarm Rate [p(yes/n)] Fig.. Unstandardized ROC curves for the two test dates (prior and after training), based on pooled data from 6 participants. a, b) ROC curves prior training with display durations of 4 sec (a) and 8 sec (b). c, d) ROC curves after training with display durations of 4 sec (c) and 8 sec (d). As mentioned in section II such ROC curves are predicted from two-state low threshold theory ([9]) and are not consistent with the Gaussian distribution assumptions of SDT. Standardized ROC curves are shown in Figure and one can clearly see that none of them is linear as would be predicted by SDT. This was confirmed in individual ROC analyses that revealed significant deviations from linearity for several participants (χ-tests). Interestingly, nonlinearity seems to be more pronounced after training (see bottom halves of Figure and ). These results suggest the existence of a low threshold and challenge the validity of SDT for explaining the data obtained in this study. As a consequence, the use of parametric SDT c ZN d Fig.. Standardized detection ROC curves for the two test dates (prior and after training), based on pooled data for the 6 participants. a, b) ROC curves prior training for 4 sec (a) and 8 sec (b). c, d) ROC curves after training for 4 sec (c) and 8 sec (d). However, it remains to be investigated whether our results can be replicated using other stimuli and whether similar results can be obtained when other threat categories are used (e.g. guns, knives, dangerous goods, etc.). In any case it is an interesting question to what extent different detection measures correlate. Table shows correlation coefficients (PEARSON) between the detection measures d, m, Az, A and proportion correct p(c) max calculated according to [7] for all four conditions ( test dates and display durations). ZN TABLE CORRELATIONS BETWEEN DETECTION MEASURES (r) (r) Test Dec/Jan d' phit m Az A' d' phit m Az A' phit m Az A' p(c) max Test Mar/April d' phit m Az A' d' phit m Az A' phit m Az A' p(c) max Note. All p-values <.. Standard deviation was 8 training sessions. 6
5 IEEE ICCST 4, Proceedings As it can be seen in Table, the correlations between the different measures are quite high. In general, there is a tendency of slightly smaller correlations after training, particularly for display durations of 8 sec. Figure 4 visualizes the training effect using the different Detection Performance a Detection Performance d 4 sec m 4 sec d 8 sec m 8 sec Az 4 sec A 4 sec p(c) max 4 sec Az 8 sec A 8 sec p(c) max 8 sec.6 Dec/Jan Mar/April b TS TS Fig. 4. Illustration of training effect by comparing performance prior training (Test, Dec/Jan ) and after training sessions on average (Test, Mar/April ) for display durations of 4 and 8 sec. a) d, m, b) Az, A and p(c) max TS = training sessions. detection performance measures. A large training effect is clearly apparent for each detection measure. While substantial differences in slope can be observed between d and m (Figure 4a), the comparison between A, Az and p(c) max reveals relatively parallel lines (Figure 4b). Detection performance is slightly better for display durations of 8 vs. 4 seconds, which is apparent for all measures. Statistical analyses are reported only for A since ROC analysis could not support parametric SDT measures. A twoway analysis of variance (ANOVA) with the two withinparticipants factors test date (prior and after training) and display duration (4 and 8 sec) showed a significant effect of test date, F(, 6) = 4.7, MSE =., p <., and display duration, F(, 6) = 86.95, MSE =.5, p <.. Effect sizes were high [5], with η =.9 for test date, and η =.59 for display duration. There was no significant interaction between test date and display duration, F(, 6) =.4, MSE =., p =.8. Internal reliability was assessed by calculating Cronbachs Alpha using hits (NOT OK response for threat images) and correct rejections (OK responses for non-threat images). Table contains the reliability coefficients for the two test dates (prior and after training) and each group of trainees while pooling display durations of 4 and 8 seconds. TABLE RELIABILITY ANALYSES (CRONBACH S ALPHA) Dec/Jan TS (Test ) Mar/April TS (Test ) Group A (N =).9.9 Group B (N = 5).9.8 Group C (N = 7).9.9 Group D (N = 8).9.9 Note. Internal reliability coefficients broken up by participant group and test date. V. DISCUSSION The objective of this study was to compare different measures of X-ray detection performance while taking effects of training into account. To this end, computer based tests were used that were conducted before and after CBT. From a regulators perspective the hit rate is sometimes the preferred measure when data from threat image projection (TIP) is used to judge the performance of screeners. However, the hit rate alone is not a valid measure because it is not possible to distinguish between good detection ability and a liberal response bias. For example an anxious screener might achieve a high hit rate only because most bags are judged as being NOT OK. In this case security is achieved at the expense of efficiency, which would be reflected in long waiting lines at the checkpoint. It would be much more beneficial to achieve a high degree of security without sacrificing efficiency. This implies a high hit rate and a low false alarm rate. SDT provides several measures that take the hit and false alarm rate into account in order to achieve more valid measures of detection performance. Although the use of parametric SDT measures is still very common (e.g. [6], [5], []), several studies have used nonparametric A because its computation does not require a priori assumptions about the underlying distributions (e.g. [7], [8], [4]). ROC analysis can be used to test whether the assumptions of SDT are fulfilled. We found that standardized ROCs deviated from linearity both before and after training of IED detection. Interestingly, unstandardized ROC curves could be fitted very well by two straight lines, just as would be predicted from two-state low threshold theory of [9]. These results challenge the validity of SDT measures as estimates of threat detection performance, at least when detection of IEDs in X-ray images of passenger bags is concerned. It certainly remains to be investigated whether our results can be replicated with different stimulus material and other threat items than IEDs. In any case however, our findings suggest that other detection performance measures than those from SDT should be considered, too. As mentioned above, the calculation of A 7
6 IEEE ICCST 4, Proceedings requires no a priori assumption about the underlying distributions, which has often been regarded as an advantage over SDT measures such as d and m. In many applications such as risk analysis, quality performance measurement, and competency assessment based on TIP or CBT data, only hit and false alarm rates are available and multipoint ROCs can not be obtained to test the assumptions of SDT measures. At least in these cases it should be considered to use A in addition to d, while certainly both measures are more valid estimates of detection performance than the hit rate alone. Finally, it should be noted that the five psychophysical measures compared in this study were usually strongly correlated. More specifically, the measures that are most often reported in the detection literature, A and d, correlated in all four test conditions with r >=.75. And in a recent study using computer-based tests with different types of threat items even higher correlations between A and d were found (r >.9, [4]). ACKNOWLEDGMENT We are thankful to Zurich State Police, Airport Division for their help in creating the stimuli and the good collaboration for conducting the study. REFERENCES [] A. Schwaninger and F. Hofer, Evaluation of CBT for increasing threat detection performance in X-ray screening, in The Internet Society 4, Advances in Learning, Commerce and Security, K. Morgan and M. J. Spector, Eds., Wessex: WIT Press, 4, pp [] M. Graf, A. Schwaninger, C. Wallraven, and H.H. Bülthoff, Psychophysical results from experiments on recognition & categorization, Information Society Technologies (IST) programme, Cognitive Vision Systems CogVis; IST975,. [] A. Schwaninger, Object recognition and signal detection, in Praxisfelder der Wahrnehmungspsychologie, B. Kersten and M.T. Groner, Eds., Bern: Huber, in press. [4] A. Schwaninger, H. Hardmeier, and F. Hofer, Measuring visual abilities and visual knowledge of aviation security screeners, IEEE ICCST Proceedings, this volume. [5] J.S. McCarley, A., Kramer, C.D. Wickens, E.D. Vidoni, and W.R. Boot, Visual Skills in Airport-Security Screening, Psychological Science, vol. 5, pp. 6, 4. [6] D.M. Green and J.A. Swets, Signal detection theory and psychophysics. New York: Wiley, 966. [7] N.A. MacMillan and C.D. Creelman, Detection theory: A user s guide. Cambridge: University Press, 99. [8] D.H. Krantz, Threshold theories of signal detection, Psychological Review, vol. 76, pp. 84, 969. [9] R.D. Luce, A threshold theory for simple detection experiments, Psychological review, vol. 7, pp. 6-79, 96. [] G.A. Gescheider, Psychophysics: The Fundamentals (rd Ed), Mahwah, NJ: Lawrence Erlbaum Associates, Publishers, 998. [] J.A. Swets, Is there a sensory threshold?, Science, vol. 4, pp , 96. [] J.A. Swets, W.P.Jr. Tanner, and T.G. Birdsall, The evidence for a decision-making theory of visual detection, Electronic Defense Group, University of Michigan, Tech. Rep. No. 4, 955. [] J.A. Swets and R.M. Pickett, Evaluation of Diagnostic Systems: Methods from Signal Detection Theory. New York: Academic Press, 98. [4] C.E. Metz, Some practical issues of experimental design and data analysis in radiological ROC studies, Investigative Radiology, vol. 4, pp. 445, 989. [5] D.D. Dorfman and E. Alf, Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals rating method data, Journal of Mathematical Psychology, vol. 6, pp , 969. [6] I. Pollack and D.A. Norman, A non-parametric analysis of recognition experiments, Psychonomic Science, vol., pp. 56, 964. [7] J.B. Grier, Nonparametric indexes for sensitivity and bias: Computing formulas, Psychological Bulletin, vol. 75, 44-49, 97. [8] D. Aaronson and B. Watt, Extensions of Grier s computational formulas for A and B to below-chance performance, Psychological Bulletin, vol., pp , 987. [9] J.G. Snodgrass and J. Corwin, Pragmatics of measuring recognition memory: Applications to dementia and amnesia, Journal of Experimental Psychology: General, vol. 7, pp. 4-5, 988. [] R.E. Pastore, E.J. Crawley, M.S. Berens, and M.A. Skelly, Nonparametric A and other modern misconceptions about signal detection theory, Psychonomic Bulletin & Review, vol., pp ,. [] N.A MacMillan, and C.D. Creelman, Triangles in ROC space: History and theory of nonparametric measures of sensitivity and response bias, Psychonomic Bulletin & Review, vol., pp. 647, 996. [] J.P. Egan, Recognition memory and the operating characteristic, Hearing and Communication Laboratory, Technical Note AFCRC-TN- 58-5, Indiana University, 958. [] A. Schwaninger, Training of airport security screeners, AIRPORT, 5, pp.,. [4] A. Schwaninger, Computer based training: a powerful tool to the enhancement of human factors, Aviation security international, February, pp. 6, 4. [5] J. Cohen, Statistical power analysis for the behavioral sciences. New York: Erlbaum, Hillsdale, 988. [6] J.A. Swets, Signal detection theory and ROC analysis in psychology and diagnostics Collected Papers, Mahwah, NJ: Lawrence Erlbaum Associates, Publishers, 996. [7] A.D. Fisk and W. Schneider, Control and Automatic Processing during Tasks Requiring Sustained Attention: A New Approach to Vigilance, Human Factors, vol., pp , 98. [8] G.C. Prkachin, The effects of orientation on detection and identification of facial expressions of emotion, British Journal of Psychology, vol. 94, pp. 45-6,. 8
Increasing Efficiency in Airport Security Screening
AVSEC World 2004, November 3-5, Vancouver, B.C., Canada Increasing Efficiency in Airport Security Screening Dr. Adrian Schwaninger Department of Psychology University of Zurich, Switzerland www.psychologie.unizh.ch/vicoreg
X-ray Imagery: Figure 1. Opaque colouring algorithms of older X-ray imaging systems. a: greyscale image b: image with opaque colouring.
: enhancing the value of the pixels Whilst huge strides have been made in the enhancement of X-ray technology, how can we best benefit from the better imagery provided to us and use the data provided us
Detection Sensitivity and Response Bias
Detection Sensitivity and Response Bias Lewis O. Harvey, Jr. Department of Psychology University of Colorado Boulder, Colorado The Brain (Observable) Stimulus System (Observable) Response System (Observable)
Type I error rates and power analyses for single-point sensitivity measures
Perception & Psychophysics 28, 7 (2), 389-4 doi:.3758/pp.7.389 Type I error rates and power analyses for single-point sensitivity measures Caren M. Rotello University of Massachusetts, Amherst, Massachusetts
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
II. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST
UNDERSTANDING THE DEPENDENT-SAMPLES t TEST A dependent-samples t test (a.k.a. matched or paired-samples, matched-pairs, samples, or subjects, simple repeated-measures or within-groups, or correlated groups)
Study Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
Statistics. Measurement. Scales of Measurement 7/18/2012
Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST
UNDERSTANDING The independent-samples t test evaluates the difference between the means of two independent or unrelated groups. That is, we evaluate whether the means for two independent groups are significantly
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to
The Effects of Store Layout on Consumer Buying Behavioral Parameters with Visual Technology
Journal of Shopping Center Research (2007), 14, 2, pp. 63-72. The Effects of Store Layout on Consumer Buying Behavioral Parameters with Visual Technology Sandra S. Liu*, Robert Melara**, and Raj Arangarasan***
Research Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
9.63 Laboratory in Cognitive Science. Hypothetical Experiment
9.63 Laboratory in Cognitive Science Fall 2005 Course 2a- Signal Detection Theory Aude Oliva Ben Balas, Charles Kemp Hypothetical Experiment Question: How LSD drug affects rat s running speed? Method &
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
Premaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
Correlational Research
Correlational Research Chapter Fifteen Correlational Research Chapter Fifteen Bring folder of readings The Nature of Correlational Research Correlational Research is also known as Associational Research.
RESEARCH METHODS IN I/O PSYCHOLOGY
RESEARCH METHODS IN I/O PSYCHOLOGY Objectives Understand Empirical Research Cycle Knowledge of Research Methods Conceptual Understanding of Basic Statistics PSYC 353 11A rsch methods 01/17/11 [Arthur]
Using Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but
Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic
Measuring Line Edge Roughness: Fluctuations in Uncertainty
Tutor6.doc: Version 5/6/08 T h e L i t h o g r a p h y E x p e r t (August 008) Measuring Line Edge Roughness: Fluctuations in Uncertainty Line edge roughness () is the deviation of a feature edge (as
Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
Sample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
How To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
Master of Psychology
Semester I Paper No. Paper title OPEP 101: Cognitive Processes OPEP 102 Psychology Practicals: Experiments OPEP 103: Psychological Testing OPEP 104: Statistical Methods Semester II Paper No. Paper title
Jitter Measurements in Serial Data Signals
Jitter Measurements in Serial Data Signals Michael Schnecker, Product Manager LeCroy Corporation Introduction The increasing speed of serial data transmission systems places greater importance on measuring
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
1 Error in Euler s Method
1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of
Analysis of Variance ANOVA
Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.
Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.
Consider a study in which How many subjects? The importance of sample size calculations Office of Research Protections Brown Bag Series KB Boomer, Ph.D. Director, [email protected] A researcher conducts
Reflection and Refraction
Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
Lecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
WHAT IS A JOURNAL CLUB?
WHAT IS A JOURNAL CLUB? With its September 2002 issue, the American Journal of Critical Care debuts a new feature, the AJCC Journal Club. Each issue of the journal will now feature an AJCC Journal Club
CHI-SQUARE: TESTING FOR GOODNESS OF FIT
CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity
TEACHER NOTES MATH NSPIRED
Math Objectives Students will understand that normal distributions can be used to approximate binomial distributions whenever both np and n(1 p) are sufficiently large. Students will understand that when
DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
Direct Evidence Delay with A Task Decreases Working Memory Content in Free Recall
1 Direct Evidence Delay with A Task Decreases Working Memory Content in Free Recall Eugen Tarnow, Ph.D. 1 18-11 Radburn Road, Fair Lawn, NJ 07410, USA [email protected] 1 The author is an independent
Content Sheet 7-1: Overview of Quality Control for Quantitative Tests
Content Sheet 7-1: Overview of Quality Control for Quantitative Tests Role in quality management system Quality Control (QC) is a component of process control, and is a major element of the quality management
UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)
UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design
Statistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
The Null Hypothesis. Geoffrey R. Loftus University of Washington
The Null Hypothesis Geoffrey R. Loftus University of Washington Send correspondence to: Geoffrey R. Loftus Department of Psychology, Box 351525 University of Washington Seattle, WA 98195-1525 [email protected]
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?
SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one? Simulations for properties of estimators Simulations for properties
Map of the course. Who is that person? 9.63 Laboratory in Visual Cognition. Detecting Emotion and Attitude Which face is positive vs. negative?
9.63 Laboratory in Visual Cognition Fall 2009 Mon. Sept 14 Signal Detection Theory Map of the course I - Signal Detection Theory : Why it is important, interesting and fun? II - Some Basic reminders III
SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES
SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR
Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator
Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)
1 Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship) I. Authors should report effect sizes in the manuscript and tables when reporting
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
Introduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
Two-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
The Capacity of Visual Short- Term Memory Is Set Both by Visual Information Load and by Number of Objects G.A. Alvarez and P.
PSYCHOLOGICAL SCIENCE Research Article The Capacity of Visual Short- Term Memory Is Set Both by Visual Information Load and by Number of Objects G.A. Alvarez and P. Cavanagh Harvard University ABSTRACT
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
Means, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
RESEARCH METHODS IN I/O PSYCHOLOGY
RESEARCH METHODS IN I/O PSYCHOLOGY Objectives Understand Empirical Research Cycle Knowledge of Research Methods Conceptual Understanding of Basic Statistics PSYC 353 11A rsch methods 09/01/11 [Arthur]
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
Statistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Physics Lab Report Guidelines
Physics Lab Report Guidelines Summary The following is an outline of the requirements for a physics lab report. A. Experimental Description 1. Provide a statement of the physical theory or principle observed
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY
TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online
Principles of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine [email protected] Fall 2011 Answers to Questions
Week 3&4: Z tables and the Sampling Distribution of X
Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal
Permutation Tests for Comparing Two Populations
Permutation Tests for Comparing Two Populations Ferry Butar Butar, Ph.D. Jae-Wan Park Abstract Permutation tests for comparing two populations could be widely used in practice because of flexibility of
Descriptive Statistics and Measurement Scales
Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample
Reliability. 26.1 Reliability Models. Chapter 26 Page 1
Chapter 26 Page 1 Reliability Although the technological achievements of the last 50 years can hardly be disputed, there is one weakness in all mankind's devices. That is the possibility of failure. What
Introduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
1 Nonparametric Statistics
1 Nonparametric Statistics When finding confidence intervals or conducting tests so far, we always described the population with a model, which includes a set of parameters. Then we could make decisions
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Chapter 4 and 5 solutions
Chapter 4 and 5 solutions 4.4. Three different washing solutions are being compared to study their effectiveness in retarding bacteria growth in five gallon milk containers. The analysis is done in a laboratory,
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Measurement Information Model
mcgarry02.qxd 9/7/01 1:27 PM Page 13 2 Information Model This chapter describes one of the fundamental measurement concepts of Practical Software, the Information Model. The Information Model provides
1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
Error Type, Power, Assumptions. Parametric Tests. Parametric vs. Nonparametric Tests
Error Type, Power, Assumptions Parametric vs. Nonparametric tests Type-I & -II Error Power Revisited Meeting the Normality Assumption - Outliers, Winsorizing, Trimming - Data Transformation 1 Parametric
FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5
Physics 161 FREE FALL Introduction This experiment is designed to study the motion of an object that is accelerated by the force of gravity. It also serves as an introduction to the data analysis capabilities
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
