Comparison of Three One-Question, Post-Task Usability Questionnaires
|
|
|
- Oswald McCoy
- 9 years ago
- Views:
Transcription
1 Comparison of Three One-Question, Post-Task Usability Questionnaires Jeff Sauro Oracle Corporation Denver, CO USA Joseph S. Dumas User Experience Consultant Yarmouth Port, MA USA ABSTRACT Post-task ratings of difficulty in a usability test have the potential to provide diagnostic information and be an additional measure of user satisfaction. But the ratings need to be reliable as well as easy to use for both respondents and researchers. Three one-question rating types were compared in a study with 26 participants who attempted the same five tasks with two software applications. The types were a Likert scale, a Usability Magnitude Estimation (UME) judgment, and a Subjective Mental Effort Question (SMEQ). All three types could distinguish between the applications with 26 participants, but the Likert and SMEQ types were more sensitive with small sample sizes. Both the Likert and SMEQ types were easy to learn and quick to execute. The online version of the SMEQ question was highly correlated with other measures and had equal sensitivity to the Likert question type. Author Keywords Usability evaluation, satisfaction measures, post-task ratings, sensitivity, external validity. ACM Classification Keywords H5.m. Information interfaces and presentation (e.g., HCI): User Interfaces Evaluation/Methodology. INTRODUCTION One of the measures of user satisfaction in a usability test is a post-task questionnaire or rating. Among the advantages of this measure are that it can: provide diagnostic information about usability issues measure user satisfaction immediately after the event, usually the completion of a task, potentially increasing its validity. It does not measure overall satisfaction with a product, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. CHI 29, April 4 9, 29, Boston, Massachusetts, USA. Copyright 29 ACM /9/4 $5.. which is the domain of post-test questionnaires, such as the Software Usability Scale (SUS) [1]. Because of the time constraints imposed by a measure made after each task, researchers have worked to make post-task questionnaires brief and easy for participants to use. One of the first post-task questionnaires, the After-Scenario Questionnaire (ASQ), is composed of three rating scales in a Likert 1 format [3]. To assess the impact of the questionnaire, participants performed the same tasks with three software products and filled out the questionnaire after each task. The ASQ exhibited acceptable reliability and sensitivity and the Likert format was easy for participants to use and easy for researchers to score. A more recent study compared four variations of the Likert question type, including two of the ASQ questions [8]. In that study, each participant used one of the formats to rate tasks. All of the scales had significant correlations with task time and a post-test SUS questionnaire. The analysis also correlated the rating types with the total set of ratings from over 1 respondents. The format with the highest correlation was similar to that shown in Figure 1 (This versions shows 7 levels whereas [8] had 5.). That format was also the most sensitive at smaller sample sizes. That study also included a version of Usability Magnitude Estimation (UME) [5]. With UME, users create their own scale. They assign a task rating with any value greater than zero, whether they are rating task difficulty or any other subjective dimension. The judgment and therefore the rating is supposed to be on the basis of ratios. So if Task 1 Figure 1. A variant of the Likert scale found most reliable by [8] 1 We recognize that the term Likert scale is sometimes used to mean a scale with the exact format that Rensis Likert used and sometimes used to refer to any semantic distance rating. We are using the term in the latter sense.
2 is rated a and Task 2 is judged twice as difficult, it should be given a rating of 2. The resulting ratings are then converted into a ratio scale of the subjective dimension.. UME was created to overcome some of the disadvantages of Likert scales. Their closed-ended nature may restrict the range of ratings available to respondents (so called ceiling and floor effects) and the resulting scale has questionable interval properties. But the conversion of the raw ratings is achieved with a mathematical formula, which makes UME more burdensome for some researchers than the Likert format. A UME condition was included in [8] but, in the pilot test, the moderator had difficulty explaining to participants how to make ratio judgments. That study had a remote asynchronous design, which meant that there would be no moderator-participant interaction when participants made their judgments. To simplify the rating, that study constrained the UME scale to values between 1 and, making the scale closed ended. The study found that participants treated their version of UME like a Likert scale, ignoring or not understanding how to make ratio judgments. Our organization conducts many user studies each year and we would like at least some of them to include a post-task satisfaction measure. The studies reported here were attempts to gather some solid data about the value of that measure. In Experiment 1, we conducted a small-scale study to compare the Likert and UME formats without restricting the range of the UME scale. In Experiment 2, we added another rating method that has been found to be easy to use: the Subjective Mental Effort Questionnaire (SMEQ) also referred to as the Rating Scale for Mental Effort [,11]. It consists of a single scale with nine labels from Not at all hard to do to Tremendously hard to do (See Figure 2.). In the paper version, participants draw a line through a vertical scale to indicate how much mental effort they had to invest to execute a task. The item positions in a paper format are shown as millimeters above a baseline and the line of the scale runs from to 15, thus leaving quite a large distance above Tremendously hard to do, which is sometimes used by participants. Scoring the paper version of SMEQ requires measuring the distance in millimeters from the nearest vertical line marking. In previous studies, SMEQ has been shown to be reliable and easy for participants to use [2,]. It shares some qualities with the UME in that its upper bound is very open ended. But it may be easier for users to learn than a UME judgment. What s more, the scaled labels (originally in Dutch) were chosen based on psychometrically calibrating them against tasks. Figure 2. The SMEQ. Scale Steps One criticism of the ubiquitous Likert scales is the small number of discrete levels (usually 5 or 7). This limitation could introduce error as respondents are forced into a few choices. The theoretical advantage of UME or SMEQ is their continuous (or at least, near-continuous) number of response choices. With more choices user sentiments can be more precisely recorded. The more scale steps in a questionnaire item the better, but with rapidly diminishing returns. As the number of scale steps increases from 2 to 2, there is an initial rapid increase in reliability, but it tends to level off at about 7 steps. After 11 steps there is little gain in reliability from increasing the number of steps [6]. The number of steps is important for single-item assessments, but is usually less important when summing scores over a number of items. Attitude scales tend to be highly reliable because the items typically correlate rather highly with one another [4]. Since we are working with single item scales the number of scales steps is important. The analysis by Tedesco and Tullis [8] showed that a single question is as good as or better than multiple questions in gathering post-task subjective satisfaction. In this study we intended to determine if the gains from more continuous scales outweigh the additional burden compared to the simpler and more familiar Likert scale. EXPERIMENT 1 In the only study we are aware of [8], a closed-ended magnitude estimation was compared against more traditional Likert-type scales. Prior to the publication of [8] we conducted a small scale study to asses the administrative overhead in using UME as well as to investigate any advantage of UME over the more familiar Likert scales. In
3 both question types, a higher rating meant the task was more difficult. Method Six users attempted seven tasks on a web-based Supply Chain Management application. The participants were experienced with both the domain and an earlier version of the application. Prior to attempting tasks, participants received practice on making ratio judgments by rating the size of circles, the training method used in [5]. Following each of the seven tasks, participants responded to both a UME question and two 7 point Likert scales (See Figure 3). The presentation order was alternated for each task. Figure 3: Scales adapted from [3] used in Experiment 1. Participants spoke their UME rating and the moderator wrote the response down. Participants recoded their responses to the Likert scales in a web-based form. Results Scaled UME responses and the average of the two Likert responses correlated very highly (r =.84) with each other. Task performance measures were averaged and correlated with each scale. Both measures correlated with task completion rates (UME r =.45, Likert r =.5), and errors (UME r =.78, Likert r =.73) but not with task time. Only the correlations with errors were statistically significant (p <.5) due, we believe, to very low power of this test. The results of a 1-Way ANOVA with UME as the dependent variable found significant differences between tasks (F(3,44)=4.83 p <.1). Differences between tasks using the Likert scales were not significantly different (F(3,44)=1.54 p >.2 ). We found that users had some difficulty, especially early in the sessions, in grasping the ratio judgments. We were also concerned with the potential for bias caused from participants having to say their UME rating out loud. EXPERIMENT 2 Although the results of Experiment 1 suggested that an open-ended UME has some promise, participants still found it difficult to learn to make ratio judgments. Also, asking participants to make both Likert and UME judgments after each task may have confused them about either question type. Furthermore, the UME training on circle size may not be relevant for subsequent tasks in which the difficulty of using software is being judged. In Experiment 2, we substantially increased our sample size, added more relevant training and practice on UME, included an online SMEQ, and asked participants to perform only one rating after each task. Method There were 26 participants who regularly submit reports for travel and expenses and were experienced computer users. The products were two released versions of a similar travel and expense reporting application allowing users to perform the same five tasks. Ten of the participants had never used either of the applications, while 16 of them had used both. We wanted to see if experience with the products would affect task ratings. The tasks were: Create and submit for approval an expense report for attending a meeting Update, modify, and submit a saved report Change the user preference for the originating city for reports Find a previously approved report and its amount Create and submit for approval a report for expenses for a customer visit One purpose of the study was to measure the performance and preference of experienced users. Consequently, each participant was shown a slide slow demonstration of how to perform each task. They then attempted the task. The participants were not asked to think out loud. They were told that we would be recording their task times, but that they should not hurry rather to work at a steady pace as they would creating reports at work. If they made an error on a task, we asked them to repeat the task immediately. They rated the difficulty of each task using one of the three rating types and then moved on to the slide show for the next task. After completing the five tasks for one application, they moved on to the second application following the same procedure. To minimize carry-over effects we counter-balanced the application order so that half of the participants started with each application. We also counter-balanced the order of the rating types across tasks and products. After they had completed the tasks for both applications once, we asked participants to complete them again in the same order but without having to view the slide shows. All participants attempted the tasks a second time and gave a rating after each one. If time allowed, we asked them to attempt the tasks and ratings a third time. If participants completed all 3 task attempts, they would provide response ratings for each questionnaire type for each of the 26 participants. The large number of ratings allowed us to address the perceived difficulty of a task and to compare the sensitivity and practical constraints of the rating instruments.
4 We compared three post-task rating formats: The Likert scale shown in Figure 1 A UME judgment An online version of SMEQ In the UME format, participants assigned a positive number to the task difficulty by entering it into an online form. The more difficult the task, the higher the value. While there is technically no lower bound to UME, in use it can have lower bound limitations if the participant assigns the first task a low value. To avoid this problem, the rating procedure is often created to leave room at the lower end. In this study, we asked participants first to perform a very simple baseline task to simply click on a search icon (See Figure 4.). For the SMEQ, we created the online version shown in Figure 6 (available at In its paper version, the vertical scale is standardized at 15 centimeters high, filling most of a printed page. In our online version, we made each millimeter equal to 2.22 pixels resulting in a scale large enough to fill most of the browser widow on a 1224x768 pixel resolution monitor. Participants moved the slider with a mouse to the point in the scale that represented their judgment of difficulty. The slider widget provided the researcher with the scale value to the thousandth decimal. As with all three question types, a more difficult task should be given a higher value. At the end of the session, participants were asked to complete the ten-item SUS for each application. Results There were no significant differences between the sample of ten users with no experience with either product and the 16 who had experience with both. Consequently, we have combined them into one group of 26 for all the analyses shown in this paper. Figure 4. The UME baseline task. Figure 5. The post-task UME rating. On subsequent tasks in which the UME format was used, participants were asked to rate task difficulty relative to the simple, baseline task to which we assigned a difficulty of. (See Figure 5.) As noted above, some participants have a difficult time understanding the nature of ratio judgments [8]. Concepts such as twice as difficult and one-half as difficult take training and feedback to understand. Consequently, we gave participants two practice exercises on UME. Both exercises required making a difficulty rating of a software use task, one easy and one more difficult. We expected that these tasks would provide more relevant practice than rating the size of circles. During the practice, the moderator explained the meaning of their rating to participants. For example, if the participant gave a practice task a rating of 2, the moderator would say, That means the task was twice as difficult as the baseline search task you did earlier. Figure 6. The online SMEQ.
5 Task Product A Product B Diff (24) 5 (14) (19) 54 (9) (13) 34 (6) () 33 (11) (19) 61 (14) Ave 5 (14) 5 (14) Table 1. Mean task times (standard deviations) in seconds for the third errorless trial for two products. Overall, Product A was more difficult to use than Product B. Table 1 shows the average task times for the two products for the third errorless trial. The sample sizes are different because not all of the participants completed all three trials on both products. All of the mean differences were significant by t-test (p<.1) except for Task 4, which was not significant. Figure 7 shows that the post-test SUS ratings also favored Product B. Diagnostic Value of Post-Task Rating Scales Figure 7 shows the mean SUS scores for both products. Higher values indicated higher perceived usability. The large gap between the two means makes a very strong case that Product A is perceived as less usable than B. But if one then hopes to improve the usability of Product A, the SUS scores provide little help in identifying which features and functions are contributing most to the poor ratings. It is unclear from the scores alone what is causing this difference in perception. n (statistically significant). The graphs are oriented so a positive difference indicates a higher perceived difficulty for Product A. Even the least sensitive of the scales, UME (see Figure 8b), discriminated among tasks. Tasks 1 and 2 show a larger gap from the boundary versus less difference between products with respect to the actions and functions encountered for Tasks 3, 4, and 5. Figures 8a and 8c show the better discrimination for SMEQ and Likert respectively as only Task 4 showed no significant difference in perceived difficulty (T4 crosses the threshold). Tasks 1 and 2 appear to be the largest drivers of the lower perceived usability between products. If one needed a prioritized list of issues in hopes of improving Product A, then Tasks 1 and 2 would be the first ones to examine. T1 T2 T3 T4 T SMEQ Difference between Prod. A & B Figure 8a. Differences between products by task and 95% CI for SMEQ. Dashed line shows the boundary, meaning no difference in difficulty between products for that task (T4) T1 B A 25 5 SUS 75 Figure 7. Mean and 95% CI for System Usability Scores for Products B and A (n=26). In examining the perceived difficulty of individual tasks, Figures 8a-c show the differences in responses by questionnaire type. When the lower boundary of the confidence interval is above the point (meaning no difference), the observed difference is greater than chance T2 T3 T4 T5-5 5 UME Difference Between Prod. A & B Figure 8b. Differences between products by task and 95% CI for UME. Dashed line shows the boundary, meaning no difference in difficulty between products for that task (T3,T4,T5). 15
6 T1 9 SMEQ T2 T3 T4 Significant Means Likert UME T Likert Difference Between Prod. A & B Figure 8c. Differences between products by task and 95% CI for Likert. Dashed line shows the boundary, meaning no difference in difficulty between products for that task (T4). Sensitivity Resample To assess the sensitivity of each question type, we took random samples with replacement at sample sizes of 3, 5, 8,, 12, 15, 17, 19, and 2 and compared the means for the two products using a paired-t-test. We used the number of means that were significantly different (p <.5) favoring Product B tasks as the dependent variable and sample size, task, and questionnaire as the three independent variables. The more sensitive a questionnaire type is, the more readily it can detect significant differences between products with smaller sample sizes. The results are displayed in Figures 9 through 12. The results of a 2-Way ANOVA (Question Type, Sample Size) showed a significant difference among questionnaires (p <.5). A post-hoc comparison test using the Bonferonni correction suggested the only significant difference was Sample Sizes Figure. Sensitivity by sample sizes by question type. Y-scale shows the number of significantly different means out of re-samples. Significant Means L S U Sample Sizes Figure 11. Sensitivity by sample sizes by question type. Graph shows the mean and 95% confidence intervals (L=Likert, S=SMEQ, U=UME) Likert SMEQ Significant Differences L S U UME Figure 9. Overall Sensitivity Across Samples Sizes and Tasks. Graph shows mean number of samples found significant out of. Error bars are 95% confidence intervals with Bonferonni Correction. Task T1 T2 Figure 12. Difference between tasks. Graph shows the mean and 95% confidence intervals (L=Likert, S=SMEQ, U=UME) for the 4 tasks that had a significant difference for the full dataset. between SMEQ and UME (p <.5). Figure 9 graphically confirms that SMEQ had higher sensitivity than UME but not from Likert and Likert was not significantly more T3 T5
7 sensitive than UME. Figures and 11 show that UME lags the other two question types across sample sizes larger than 8 and most notably for tasks T3 and T5 (See Figure 12.). There was no significant interaction between sample size and rating type. SMEQ held a slight advantage over Likert, although not large enough to be statistically significant given the sample size. External Validity With only one question type per task, we could not evaluate internal consistency using Cronbach s Alpha. We did test how well each rating type correlated with the performance measures taken at the task level (n =, five tasks for two products). Each participant in this study also filled out the SUS post-test questionnaire for each of the two products. We correlated the mean post-task satisfaction ratings by participant with the SUS score by participant (n = 52, 26 users rating two products). For a discussion on the correlations between performance measures and post-test and post-task ratings scales see [7]. The results of all the correlations are shown in Table 2. The correlations done at the user summary level (UAO aggregation level see [7]) showed correlations with SUS of around.6 for SMEQ and Likert SUS Time Comp. Errors Likert * SMEQ * -.72 UME Table 2 Correlations (r) between rating types and SUS, Task Times, Completion rates, and Errors. Asterisked correlations are significant at the p <. all other correlation significant at the p <.1 level. Likert SMEQ UME Likert x SMEQ.94 x UME x Table 3. Correlations between rating types at the task level. All correlations are significant at the p <.1 level. and a lower.3 for UME (The differences between those correlations were not statistically significant). Both SMEQ and Likert showed statistically higher correlations than UME for completion rates and errors respectively (p <.1). The size and significance of the correlation between SMEQ and task-time is consistent with published data [2,]. Tests were performed using Fisher-z transformed correlations. Finally we compared the formats against each other. Because only one questionnaire was administered after each task, we could not correlate at the observation level, instead we aggregated the measures at the task level (n = ) as shown in Table 3 (the TAO aggregation level see [7]). In spite of UME s lack of sensitivity, it correlated significantly with the other types as it did in a previous study [8]. SCALE BOUNDARY EFFECTS The theoretical advantage of UME and SMEQ is their continuous levels. UME has no upper bound. SMEQ has an upper limit but placed substantially above the highest label providing additional upward choices (indicating extremely high mental effort). We wanted to see if participants exploited these advantages across the two products in this study. Since the Likert scale had a maximum of 7 points, we know this is the greatest number of choices we could see expressed by one user across the task rating opportunities. We wanted to know, given a larger spectrum of choices if users would take advantage of them, thus providing evidence that the Likert scale is in fact artificially constraining their judgments UME Figure 13. Distribution of UME scores (n = 231) showing a total of 26 distinct scale choices SMEQ Figure 14. Distribution of SMEQ scores (n = 236) showing a total of 211 distinct scale choices (two SMEQ scale steps per column) Likert Figure 15. Distribution of Likert scores (n = 234) showing a total of 6 distinct scale choices.
8 Figures 13, 14, and 15 show the distribution of responses for each question type. Only 6 of the 7 options were selected from the Likert scale while there were 26 and 211 different scale choices used in UME and SMEQ respectively. Grouping the scale choices by participant, at most there could be distinct choices (1 for each task rating). Figure 16 shows the means and 95% Bonferonni corrected confidence intervals. The mean number of choices participants selected across the tasks for Likert was 3.7 (SD 1.2). For UME the mean number was 5.3 (SD 1.9) and for SMEQ the mean of 9.9 (SD.19). For SMEQ only one user selected the same response twice, an unlikely event considering that the slider allowed for responses to the thousandth decimal. LIKERT Likert SMEQ UME Avg. Number of Scale Choices Figure 17. Mean number of distinct choices selected by users across tasks by questionnaire type. SMEQ values are restricted to integers only. Error bars represent the 95% Bonferonni corrected confidence interval SMEQ UME Avg No. of Scale Choices Figure 16. Mean number of choices selected by users across tasks by questionnaire type. Error bars represent the 95% Bonferonni corrected confidence interval The results of a One-Way ANOVA confirmed the significant difference between means as indicated by the confidence intervals (F (2,75) = ; p <.1). For UME this data showed that despite having an infinite number of scale steps, the average participant used around 5 choices more than the 3.7 for the Likert scale but still around half as many as SMEQ. The nature of the online widget slider, however, allowed for responses to the thousandth decimal, a degree of accuracy greater than the original 15 millimeter tolerance of the paper-based SMEQ. To reduce the effects of decimal differences between response (e.g. response of say.2 and.1) we rounded the SMEQ responses to the nearest integer to see how this affected the number of distinct choices. In effect this reduced SMEQ to a 15 point scale like the paper-version. Figures 17 and 18 show the integer data. The mean number of distinct responses by user dropped from 9.9 to 7.9. The differences among questionnaire types still remained statistically different, even with ~2% fewer choices by user for SMEQ SMEQ Integer 15 Figure 18. Distribution of SMEQ scores (n = 236) restricted to the nearest integer, showing 48 distinct scale choices (two SMEQ scale steps per column). DISCUSSION The task times, error data, and subjective ratings showed that most of the five tested tasks were more difficult for Product A. The high correlations between the post-task ratings and the other measures showed that participants performance and their perceptions of task difficulty agree. With a sample size of 26 participants and with hundreds of task ratings, any of the three question types discriminated between many product tasks and correlated highly with both performance and with post-task SUS questions. A tester who used any one of these types would conclude that participants perceived Product A tasks as more difficult to complete. But a closer look shows interesting differences between the three types. The results of the sensitivity analysis are consistent with a previous study [8]. It is not until participant sample sizes reach the -12 level that post-task ratings began to discriminate significantly between product tasks at an 8%
9 or better level, that is, in this study, 8% of the task means for Product A were significantly higher than for Product B (see Figures -12). All of the scales used in both studies showed low sensitivity at the small sample sizes typically used in usability testing. Furthermore, [9] reported a similar sensitivity for post-test questionnaires. It appears that wherever questionnaires and ratings are used during usability test sessions, they may be unable to reliably discriminate between products with large differences in usability at sample sizes below. Both the SMEQ and Likert question types had significantly higher percentages than UME for detecting mean task differences with sample sizes greater than eight. The SMEQ percentages were higher than the Likert percentages, but not significantly so. The UME type asymptotes at about a 6% detection rate with little gain as sample sizes increased from to 2 participants (See Figure.). UME did more poorly than expected in Experiment 2. It had lower sensitivity than the other types and smaller correlations with all of the other measures. The primary reason appears to be participants inability to understand the concept of ratio judgments in a usability context, which is similar to a previous finding [8]. In spite of providing participants, in Experiment 2, with training, two practice trials rating software use difficulty, establishing a baseline value, and requiring only one rating per task, most of participants ratings were clustered around the baseline value of. While UME has no theoretical upper limit, participants were using it like a closed-ended scale. The anecdotal impression of the moderator was that participants were not making ratio judgments. Because magnitude estimation has worked in other contexts, perhaps we needed to be clearer about what a concept such as twice as difficult to use means or provide more training. But those procedures would add more time to test sessions. The number of different choices used with SMEQ were about twice the number used for UME and three times the number used for the Likert question type. But the SMEQ values still clustered close to the value of. We suspect that the baseline value of used for the UME type influenced participants use of the SMEQ scale. This carryover effect is a price we paid for the higher power gained from using the within-subjects design. Two participants spontaneously said that they used as a baseline for both UME and SMEQ. It is possible that if participants had used only SMEQ, the number of choices used would increase. It would be valuable to repeat this study with a between-subjects design. The Likert question performed quite well in this study. Participants used it with little or no explanation and it was easy for us to score. It only allowed participants seven levels but the vast majority of participants only used five (and no participant used all 7). Statistically it was as sensitive as SMEQ and had high correlations with all of the other measures. These results justify its popularity as a measure of the difficulty of software use. The SMEQ question also performed well. Our online version with the slider was easy for participants to use and for us to score, was as sensitive as the Likert question, had high correlations with other measures, and allowed participants to use a larger range of choices. Our anecdotal impression was that participants enjoyed using it more than the other question types. This is the first study we are aware of in which SMEQ has been used in an online format. We do not know if the paper format will yield similar results but we have no reason to believe it would fail to do so. Finally, it is interesting that these question types were sensitive to task and product differences even when the participants were trained on how to perform the tasks and then performed them for three errorless trials. Perhaps one of the reasons for the high correlations between the posttask question types with the other measures was the fact that all participants were experienced computer users and were repeating the same tasks. However, participants were still making errors and hesitations on their third attempt. Not only do some usability problems persist over repeated trials, but users are sensitive to their presence. Subjective ratings reflect the presence of ease-of-use problems as well as initial ease-of-learning problems. CONCLUSION Post-task questions can be a valuable addition to usability tests. They provide additional diagnostic information that post-test questionnaires do not provide and it does not take much time to answer one question. This study and [8] both showed high correlations with other measures, which is evidence of concurrent validity. With sample sizes of above -12, any of the question types used in this study (and in [3]) yield reliable results. But below participants, none of the question types have high detection rates nor do the common post-test questionnaires [9]. The popular Likert question was easy for participants to use, was highly correlated with other measures and, with a seven-point format, did not show a ceiling effect for these tasks and products. What s more, it was easy for test administrators to setup and administer in electronic form. The SMEQ question showed good overall performance. In its online version, it was easy to learn to use, was highly correlated with other measures, and had equal sensitivity to the Likert question. One draw-back was the requirement of a special web-interface widget. But the widget made the question easy to score. Participants had difficulty learning to use the UME question type, confirming the findings in [8]. It was less sensitive than the other question types and had lower correlations with other measures. Given the positive results in some
10 other studies [5] and the long history of magnitude estimation applied to other stimuli, a different training and practice procedure than we used may yield better results. In addressing our question on whether the benefits of using SMEQ or UME outweigh the additional burden they bring compared to a Likert scale, this analysis suggests the answer is perhaps for SMEQ and probably not for UME. REFERENCES 1. Brooke, J. (1996). SUS: A Quick and Dirty Usability Scale. In: P.W. Jordan, B. Thomas, B.A. Weerdmeester & I.L. McClelland (Eds.), Usability Evaluation in Industry. London: Taylor & Francis, Kirakowski, J. & Cierlik, B. (1998). Measuring the Usability of Web Sites, Proceedings of the Human factors and Ergonomics Society 42nd Annual Meeting, Lewis, J. R. (1991). Psychometric evaluation of an afterscenario questionnaire for computer usability. studies: The ASQ. SIGCHI Bulletin, 23, 1, Lewis, J. R. (22). Psychometric evaluation of the PSSUQ using data from five years of usability studies. International Journal of Human-Computer Interaction, 14, McGee, M. (24). Master Usability Scaling: Magnitude Estimation and Master Scaling Applied to Usability Measurement. Proc. CHI 24 ACM Press (24), pp Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill. 7. Sauro, J. & Lewis, J.R.(29) Correlations among Prototypical Usability Metrics: Evidence for the Construct of Usability Proc. CHI 29 In Press. 8. Tedesco, D. & Tullis, T. (26). A Comparison of Methods for Eliciting Post-Task Subjective Ratings in Usability Testing. Usability Professionals Association (UPA), 26, Tullis, T. and Stetson, J. (24). A Comparison of Questionnaires for Assessing Website Usability. Usability Professionals Association (UPA), 24, Zijlstra, F. (1993). Efficiency in work behavior. A design approach for modern tools. PhD thesis, Delft University of Technology. Delft, The Netherlands: Delft University Press. 11.Zijlstra, F.R.H & Doorn, L. van (1985). The construction of a scale to measure subjective effort. Technical Report, Delft University of Technology, Department of Philosophy and Social Sciences.
UPA 2004 Presentation Page 1
UPA 2004 Presentation Page 1 Thomas S. Tullis and Jacqueline N. Stetson Human Interface Design Department, Fidelity Center for Applied Technology Fidelity Investments 82 Devonshire St., V4A Boston, MA
6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS
PROCEEDINGS of the HUMAN FACTORS AND ERGONOMICS SOCIETY 49th ANNUAL MEETING 200 2 ESTIMATING COMPLETION RATES FROM SMALL SAMPLES USING BINOMIAL CONFIDENCE INTERVALS: COMPARISONS AND RECOMMENDATIONS Jeff
Common Industry Format Usability Tests
Proceedings of UPA 98, Usability Professionals Association, Scottsdale, Arizona, 29 June 2 July, 1999 Common Industry Format Usability Tests Nigel Bevan Serco Usability Services 4 Sandy Lane, Teddington,
How to Get More Value from Your Survey Data
Technical report How to Get More Value from Your Survey Data Discover four advanced analysis techniques that make survey research more effective Table of contents Introduction..............................................................2
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS
GAZETRACKERrM: SOFTWARE DESIGNED TO FACILITATE EYE MOVEMENT ANALYSIS Chris kankford Dept. of Systems Engineering Olsson Hall, University of Virginia Charlottesville, VA 22903 804-296-3846 [email protected]
Two-Sample T-Tests Assuming Equal Variance (Enter Means)
Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of
Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration
Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the
Research Methods & Experimental Design
Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and
Confidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
Math Review. for the Quantitative Reasoning Measure of the GRE revised General Test
Math Review for the Quantitative Reasoning Measure of the GRE revised General Test www.ets.org Overview This Math Review will familiarize you with the mathematical skills and concepts that are important
The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs Using LibQUAL+ R to Identify Commonalities in Customer Satisfaction: The Secret to Success? Journal
Student Performance Q&A:
Student Performance Q&A: 2008 AP Calculus AB and Calculus BC Free-Response Questions The following comments on the 2008 free-response questions for AP Calculus AB and Calculus BC were written by the Chief
Confidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
Gage Studies for Continuous Data
1 Gage Studies for Continuous Data Objectives Determine the adequacy of measurement systems. Calculate statistics to assess the linearity and bias of a measurement system. 1-1 Contents Contents Examples
Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)
Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption
Standard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
Designing for Success: Creating Business Value with Mobile User Experience (UX)
Worcester Polytechnic Institute DigitalCommons@WPI User Experience and Decision Making Research Laboratory Publications User Experience and Decision Making Research Laboratory Spring 3-20-2014 Designing
Descriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
AP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
Automated Text Analytics. Testing Manual Processing against Automated Listening
Automated Text Analytics Testing Manual Processing against Automated Listening Contents Executive Summary... 3 Why Is Manual Analysis Inaccurate and Automated Text Analysis on Target?... 3 Text Analytics
January 26, 2009 The Faculty Center for Teaching and Learning
THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i
IBM SPSS Direct Marketing 23
IBM SPSS Direct Marketing 23 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 23, release
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores
ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores Wayne J. Camara, Dongmei Li, Deborah J. Harris, Benjamin Andrews, Qing Yi, and Yong He ACT Research Explains
Confidence Intervals for Cp
Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process
What Does a Personality Look Like?
What Does a Personality Look Like? Arman Daniel Catterson University of California, Berkeley 4141 Tolman Hall, Berkeley, CA 94720 [email protected] ABSTRACT In this paper, I describe a visualization
Variables Control Charts
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. Variables
EVALUATING POINT-OF-CARE TECHNOLOGY WITH THE IBM COMPUTER USABILITY SATISFACTION QUESTIONNAIRE
EVALUATING POINT-OF-CARE TECHNOLOGY WITH THE IBM COMPUTER USABILITY SATISFACTION QUESTIONNAIRE Dr. Thomas W. Dillon, James Madison University, [email protected] M. Tony Ratcliffe, James Madison University,
Nan Kong, Andrew J. Schaefer. Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA
A Factor 1 2 Approximation Algorithm for Two-Stage Stochastic Matching Problems Nan Kong, Andrew J. Schaefer Department of Industrial Engineering, Univeristy of Pittsburgh, PA 15261, USA Abstract We introduce
SENSITIVITY ANALYSIS AND INFERENCE. Lecture 12
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Design Parameters of Rating Scales for Web Sites
Design Parameters of Rating Scales for Web Sites PAUL VAN SCHAIK University of Teesside and JONATHAN LING Keele University The effects of design parameters of rating scales on the perceived quality of
Savvy software agents can encourage the use of second-order theory of mind by negotiators
CHAPTER7 Savvy software agents can encourage the use of second-order theory of mind by negotiators Abstract In social settings, people often rely on their ability to reason about unobservable mental content
Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score
Information Technology Services will be updating the mark sense test scoring hardware and software on Monday, May 18, 2015. We will continue to score all Spring term exams utilizing the current hardware
Constructing a TpB Questionnaire: Conceptual and Methodological Considerations
Constructing a TpB Questionnaire: Conceptual and Methodological Considerations September, 2002 (Revised January, 2006) Icek Ajzen Brief Description of the Theory of Planned Behavior According to the theory
Comparing Methods to Identify Defect Reports in a Change Management Database
Comparing Methods to Identify Defect Reports in a Change Management Database Elaine J. Weyuker, Thomas J. Ostrand AT&T Labs - Research 180 Park Avenue Florham Park, NJ 07932 (weyuker,ostrand)@research.att.com
Stigmatisation of people with mental illness
Stigmatisation of people with mental illness Report of the research carried out in July 1998 and July 2003 by the Office for National Statistics (ONS) on behalf of the Royal College of Psychiatrists Changing
SUS - A quick and dirty usability scale
SUS - A quick and dirty usability scale John Brooke Redhatch Consulting Ltd., 12 Beaconsfield Way, Earley, READING RG6 2UX United Kingdom email: [email protected] Abstract Usability does not exist
Two Correlated Proportions (McNemar Test)
Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with
In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.
MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target
The Importance and Impact of Nursing Informatics Competencies for Baccalaureate Nursing Students and Registered Nurses
IOSR Journal of Nursing and Health Science (IOSR-JNHS) e-issn: 2320 1959.p- ISSN: 2320 1940 Volume 5, Issue 1 Ver. IV (Jan. - Feb. 2016), PP 20-25 www.iosrjournals.org The Importance and Impact of Nursing
Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals
Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals Omer Duskin Dror G. Feitelson School of Computer Science and Engineering The Hebrew University
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
Treatment Satisfaction among patients attending a private dental school in Vadodara, India
J. Int Oral Health 2010 Case Report All right reserved Treatment Satisfaction among patients attending a private dental school in Vadodara, India Thanveer K* Ajith Krishnan** Sudheer Hongal*** *M.D.S,
The Usability of Electronic Stores based on the Organization of Information and Features
The Usability of Electronic Stores based on the Organization of Information and Features CHAN KAH-SING Singapore Polytechnic This paper describes an investigation on how the perceived usability of electronic
Sonatype CLM Server - Dashboard. Sonatype CLM Server - Dashboard
Sonatype CLM Server - Dashboard i Sonatype CLM Server - Dashboard Sonatype CLM Server - Dashboard ii Contents 1 Introduction 1 2 Accessing the Dashboard 3 3 Viewing CLM Data in the Dashboard 4 3.1 Filters............................................
EHR Usability Test Report of Radysans EHR version 3.0
1 EHR Usability Test Report of Radysans EHR version 3.0 Report based on ISO/IEC 25062:2006 Common Industry Format for Usability Test Reports Radysans EHR 3.0 Date of Usability Test: December 28, 2015 Date
Restaurant Tips 1. Restaurant Tips and Service Quality: A Weak Relationship or Just Weak Measurement? Michael Lynn. School of Hotel Administration
Restaurant Tips 1 Running Head: RESTAURANT TIPS AND SERVICE Restaurant Tips and Service Quality: A Weak Relationship or Just Weak Measurement? Michael Lynn School of Hotel Administration Cornell University
IBM SPSS Direct Marketing 22
IBM SPSS Direct Marketing 22 Note Before using this information and the product it supports, read the information in Notices on page 25. Product Information This edition applies to version 22, release
1 Error in Euler s Method
1 Error in Euler s Method Experience with Euler s 1 method raises some interesting questions about numerical approximations for the solutions of differential equations. 1. What determines the amount of
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
A Comparative Study of Database Design Tools
A Comparative Study of Database Design Tools Embarcadero Technologies ER/Studio and Sybase PowerDesigner Usability Sciences Corporation 909 Hidden Ridge, Suite 575, Irving, Texas 75038 tel: 972-550-1599
Graphical Integration Exercises Part Four: Reverse Graphical Integration
D-4603 1 Graphical Integration Exercises Part Four: Reverse Graphical Integration Prepared for the MIT System Dynamics in Education Project Under the Supervision of Dr. Jay W. Forrester by Laughton Stanley
Measurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical
IT S LONELY AT THE TOP: EXECUTIVES EMOTIONAL INTELLIGENCE SELF [MIS] PERCEPTIONS. Fabio Sala, Ph.D. Hay/McBer
IT S LONELY AT THE TOP: EXECUTIVES EMOTIONAL INTELLIGENCE SELF [MIS] PERCEPTIONS Fabio Sala, Ph.D. Hay/McBer The recent and widespread interest in the importance of emotional intelligence (EI) at work
Lecture Notes Module 1
Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Mixed 2 x 3 ANOVA. Notes
Mixed 2 x 3 ANOVA This section explains how to perform an ANOVA when one of the variables takes the form of repeated measures and the other variable is between-subjects that is, independent groups of participants
HMRC Tax Credits Error and Fraud Additional Capacity Trial. Customer Experience Survey Report on Findings. HM Revenue and Customs Research Report 306
HMRC Tax Credits Error and Fraud Additional Capacity Trial Customer Experience Survey Report on Findings HM Revenue and Customs Research Report 306 TNS BMRB February2014 Crown Copyright 2014 JN119315 Disclaimer
Common Tools for Displaying and Communicating Data for Process Improvement
Common Tools for Displaying and Communicating Data for Process Improvement Packet includes: Tool Use Page # Box and Whisker Plot Check Sheet Control Chart Histogram Pareto Diagram Run Chart Scatter Plot
COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.
277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies
Sampling and Sampling Distributions
Sampling and Sampling Distributions Random Sampling A sample is a group of objects or readings taken from a population for counting or measurement. We shall distinguish between two kinds of populations
research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other
1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric
Appendix B Data Quality Dimensions
Appendix B Data Quality Dimensions Purpose Dimensions of data quality are fundamental to understanding how to improve data. This appendix summarizes, in chronological order of publication, three foundational
Measurement: Reliability and Validity Measures. Jonathan Weiner, DrPH Johns Hopkins University
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
A revalidation of the SET37 questionnaire for student evaluations of teaching
Educational Studies Vol. 35, No. 5, December 2009, 547 552 A revalidation of the SET37 questionnaire for student evaluations of teaching Dimitri Mortelmans and Pieter Spooren* Faculty of Political and
Sales Checkpoint Performance Feedback System
Sales Checkpoint Performance Feedback System Technical Development Manual January, 2014 Profiles International, Inc. Profiles Office Park 5205 Lake Shore Drive Waco, Texas USA 76710 CONTENTS Chapter 1:
What are the place values to the left of the decimal point and their associated powers of ten?
The verbal answers to all of the following questions should be memorized before completion of algebra. Answers that are not memorized will hinder your ability to succeed in geometry and algebra. (Everything
Deployment of express checkout lines at supermarkets
Deployment of express checkout lines at supermarkets Maarten Schimmel Research paper Business Analytics April, 213 Supervisor: René Bekker Faculty of Sciences VU University Amsterdam De Boelelaan 181 181
Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.
Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material
Chapter 7. One-way ANOVA
Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks
Second Edition. Measuring. the User Experience. Collecting, Analyzing, and Presenting Usability Metrics TOM TULLIS BILL ALBERT
Second Edition Measuring the User Experience Collecting, Analyzing, and Presenting Usability Metrics TOM TULLIS BILL ALBERT \ \ PREFACE TO THE SECOND EDITION ACKNOWLEDGMENTS BIOGRAPHIES xili xv xvli CHAPTER
Reliability Analysis
Measures of Reliability Reliability Analysis Reliability: the fact that a scale should consistently reflect the construct it is measuring. One way to think of reliability is that other things being equal,
Probability Distributions
CHAPTER 5 Probability Distributions CHAPTER OUTLINE 5.1 Probability Distribution of a Discrete Random Variable 5.2 Mean and Standard Deviation of a Probability Distribution 5.3 The Binomial Distribution
Comparing Alternate Designs For A Multi-Domain Cluster Sample
Comparing Alternate Designs For A Multi-Domain Cluster Sample Pedro J. Saavedra, Mareena McKinley Wright and Joseph P. Riley Mareena McKinley Wright, ORC Macro, 11785 Beltsville Dr., Calverton, MD 20705
MATHS LEVEL DESCRIPTORS
MATHS LEVEL DESCRIPTORS Number Level 3 Understand the place value of numbers up to thousands. Order numbers up to 9999. Round numbers to the nearest 10 or 100. Understand the number line below zero, and
Association Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
Confidence Intervals for Cpk
Chapter 297 Confidence Intervals for Cpk Introduction This routine calculates the sample size needed to obtain a specified width of a Cpk confidence interval at a stated confidence level. Cpk is a process
Figure 1. A typical Laboratory Thermometer graduated in C.
SIGNIFICANT FIGURES, EXPONENTS, AND SCIENTIFIC NOTATION 2004, 1990 by David A. Katz. All rights reserved. Permission for classroom use as long as the original copyright is included. 1. SIGNIFICANT FIGURES
A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion
A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion Objective In the experiment you will determine the cart acceleration, a, and the friction force, f, experimentally for
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
How do you compare numbers? On a number line, larger numbers are to the right and smaller numbers are to the left.
The verbal answers to all of the following questions should be memorized before completion of pre-algebra. Answers that are not memorized will hinder your ability to succeed in algebra 1. Number Basics
Drawing a histogram using Excel
Drawing a histogram using Excel STEP 1: Examine the data to decide how many class intervals you need and what the class boundaries should be. (In an assignment you may be told what class boundaries to
AN ASSESSMENT OF SERVICE QUALTIY IN INTERNATIONAL AIRLINES
AN ASSESSMENT OF SERVICE QUALTIY IN INTERNATIONAL AIRLINES Seo, Hwa Jung Domestic & Airport Service Office, Seoul, Asiana Airlines, [email protected] Ji, Seong-Goo College of Economics and Commerce,
Assessing Measurement System Variation
Assessing Measurement System Variation Example 1: Fuel Injector Nozzle Diameters Problem A manufacturer of fuel injector nozzles installs a new digital measuring system. Investigators want to determine
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary
Shape, Space, and Measurement- Primary A student shall apply concepts of shape, space, and measurement to solve problems involving two- and three-dimensional shapes by demonstrating an understanding of:
The Effect of Flexible Learning Schedule on Online Learners Learning, Application, and Instructional Perception
1060 The Effect of Flexible Learning Schedule on Online Learners Learning, Application, and Instructional Perception Doo H. Lim University of Tennessee Learning style has been an important area of study
Pennsylvania System of School Assessment
Pennsylvania System of School Assessment The Assessment Anchors, as defined by the Eligible Content, are organized into cohesive blueprints, each structured with a common labeling system that can be read
