NBEO FAQs Regarding the December 2, 2014 PAM/TMOD Exams 1. What exactly occurred on December 2, 2014 during the PAM/TMOD exams? According to Pearson VUE, the December 2, 2014 Part II exam publication at Pearson VUE testing centers contained an erroneous navigational setting that, unfortunately, prevented candidates from returning to their exam once they arrived at their final Review Screen to answer any items they may have missed or change their responses to previous questions. This erroneous navigation setting was due to a Pearson VUE coding error that was not caught during the Pearson VUE QA process. 2. What steps have been taken to prevent recurrence of the same type of testing irregularity? Pearson VUE has updated its Quality Assurance scripts to rigorously evaluate the Section Navigation switch which governs this trait. Pearson VUE has run this upgraded script on the new exams, has seen them work correctly in the field, and will continue to do so for all future exams. In addition, the NBEO will review all future computer-based examinations administered using software that will allow the staff to have unlimited attempts to test the exams in just the way the Candidate will view it. This software will give the NBEO professional staff the opportunity to take the exam in the testing conditions that are realistically available at the Test Center and NBEO can more accurately determine if the exam will be administered faithfully to our standards of high quality. In addition to correcting the coding error and updating its Quality Assurance practices, Pearson VUE has been very responsive and supportive of NBEO in providing follow-up remedies for affected exam candidates. 3. Were there scoring algorithm(s) that accommodated the various circumstances under which students took the examination? As indicated in Question 1 and according to Pearson VUE, the Part II PAM exam publication contained an erroneous navigational setting that did not allow candidates to go back and review flagged items. It is because of this navigational setting error that the NBEO has worked to provide candidates with various avenues of redress. The scoring algorithm is the same algorithm used in previous Part II PAM administrations (i.e., number correct score: a correct response earns 1 raw point). It was not feasible to design and develop a scoring algorithm that would account for the
different and various flagged questions on the Tuesday Part II PAM administration that would accommodate all impacted candidates. It is imperative to maintain a consistent scoring method for everyone who took the December 2 administration of Part II PAM. 4. Why is the TMOD pass rate, both locally and nationally, so much lower than in prior years? The national percentage of correct scores and standard deviations are almost identical over the last 3 years, but the pass rate is 25% lower in 2014. The lower pass rate on the TMOD portion of the test is possibly due to several factors: 1) A new standard setting took place on January 9-10, 2015 and a new cut score (89 raw points out of a total of 112 raw points) was established for the TMOD test embedded within Part II PAM. 2) The erroneous navigational setting did not allow Candidates to review flagged items on session 2 of the Part II PAM examination. 3) Candidate preparedness varies from year to year. The above three factors are probable causes for the reduction in the TMOD pass rate for the December 2014 administration of Part II PAM. NBEO could not have predicted there was going to be a glitch with the navigational setting nor could have predicted how 23 standard setters from various parts of the country were going to respond to the question of: Would a Minimally Qualified Candidate answer this item correctly or not? The definition of an MQC was agreed to by all of the standard setters before looking at a single item (full case, minicase, or solo items) on the examination. Hence, it is not the change in the items that established a new definition of MQC because the standard setters did not review any of the items while deriving the definition. All data presented are taken from the Institutional aggregate reports and provided to the institutions upon request. The table below provides data related to the TMOD portion of the test. The data are related to the December (targeted) administrations of TMOD. Dec-10 Dec-11 Dec-12 Dec-13 Dec-13 Form A A A A B % corr 81.09 76.87 81.76 83.99 79.98 Std Dev 9.09 9.64 8.88 7.89 9.61 Pass Rate(%) 91.02 90.12 89.69 97.37 90.87 One can conclude that the candidates performance did not change over the past years if, and only if, the candidates means and variances remain the same after adjusting for form difficulty (equating the different forms of the test). The data shows that the performance is not the same over the past 4 years, especially for the TMOD portion of the test. Furthermore, the TMOD test was not the same and the number of items on the TMOD has varied from one administration to another.
It is expected that the means and variances for the past four targeted administrations have varied and the pass rates also have varied. But that does not mean that the pass rates are dependent on the means and variances. It simply means that after adjusting for form difficulty, the equating produced an equated cut score that varies from administration-to-administration as a result of the variability in the ability of the pool of candidates taking the test. The December 2014 administration did not undergo equating and, therefore, the pass rates were determined based on the results of the standard setting study alone and cannot be compared to the previous four administrations pass rates. The TMOD national percent of correct scores for the years 2010, 2011, 2012, and 2013 are 81.09, 76.87, 81.76, and 83.99, respectively. The TMOD national standard deviations for the above mentioned years are 9.09, 9.64, 8.88, and 7.89, respectively. The NBEO does not expect the variations to be almost identical because they depend on the average TMOD score for each year as well as the number of Candidates taking the TMOD. It is normal for the average to change even when the same test is administered to the same group of Candidates. As a result, the standard deviation is expected to vary from administration to administration. Once the Candidates who received their December 2, 2014 results exercised their option to have their Part II PAM scores cancelled (and were given the opportunity for a free retest), the pass rate for first-time student candidates and all candidates were 91.7% and 89.2% respectively. In addition, the pass rate for the embedded TMOD was 75.4%. 5. Was the reduction in performance/pass rate on the TMOD related to the examination delivery problems experienced in December 2014? The erroneous navigational setting affected the December 2, 2014 administration of Part II PAM and embedded TMOD items which may have contributed to a reduced level of performance on PAM and TMOD items. It is because of this impact that the NBEO has worked to provide candidates with different avenues of redress. 6. How was the cut score established for the December 2014 PAM examination? When was it established, and how did the exam irregularities affect those final numbers? Unlike the previous Part II examination (12/2009 04/2014), the Part II PAM examination administered in December 2014 consisted of classic patient cases plus two additional item-type formats to include solo items and minicases. There was a need to add these items and updated existing items to the established equating block of items. The standard setting facilitated this need by rendering the expanded collection of equating material into psychometrically useful representatives of the exam in terms of content and statistical specifications.
Furthermore, the last Part II/TMOD standard setting event took place in 2009. A new standard setting study was deemed necessary to determine how much knowledge is just enough for safe and effective contemporary entry-level practice. Therefore, a standard setting meeting was held on January 9-10, 2015 at the NBEO headquarters in Charlotte, NC. A total of 23 diverse OD panelists participated in the meeting. Consistent with the 2009 standard setting event, the Angoff method was utilized to direct the standard setting process. The erroneous navigational setting that prevented Candidates from reviewing their flagged items on Session 2 of the Part II examination had no effect whatsoever on the final numerical conclusions drawn during the standard setting event. The NBEO did not change the definition of a Minimally Qualified Candidate (MQC) or entry-level Competency. The definition and characteristics of an MQC came from the standard setters themselves and NBEO had no role in establishing, modifying, or altering it. The primary point is that the definition is not instilled but rather the collective definition of the 23 standard setters combined. Consistent with the December 2009 standard setting study, the standard setting method, the judgment decision, the demographics of participants, and the training prior to conducting the study were the same. Consistent with the 2009 standard setting study, each standard setter was indicating whether an MQC would answer an item correctly or not. The sum of the 23 ratings divided by the total number of raters would be a measure of the difficulty of the item(s) on the test. Standard setting is conducted because of the items (multiple choice items based on minicases and standalone multiple choice items) that are included on the exam and to check if the standard has remained the same since December 2009. We know that a portion of the test has changed and also that NBEO has strived to develop examinations that reflect the contemporary practice of optometry. As a result, every year, the test development committees meet and develop, modify, and change the questions on the test to be consistent with its mission and to reflect contemporary changes to the practice of optometry. Therefore, the items are constantly being updated and undergo extensive reviews and the questions that have been administered in previous standard settings are no longer the same. As an example, let us assume that the same group of setters participated in two standard settings (i.e., 2009 and 2015) and the same items are used again in 2015. It is plausible that the group of participants, on average, assigns the same ratings, or change their assigned ratings. In our view, the ratings assigned by the standard setters are test-dependent and item-dependent. For example, if the item is related to the anatomy of the eye, then a group of standard setters would indicate that an MQC would answer this item correctly whether we ask this in 2009 or 2015. On the other hand, an MQC may not have been expected to answer a basic scleral lens question correctly in 2009, but perhaps would have had that expectation in 2015. Hence, the stability of the rating is item- and test-dependent. Consistency is better addressed in the way the
standard setting study was designed where the groups underwent two rounds of ratings and their ratings were compared in rounds one and two (See split panel design section below). The main reason that the psychometric best practices for holding the study postadministration is to provide the standard setters with empirical data (i.e., difficulty of the items) prior to making judgments on round two of the study. By providing real item performance data, the judges are better equipped to make realistic judgments about the performance of an MQC. 7. Has the NBEO studied the calibration and repeatability of using a panel of experts to set the pass-fail cutoff score for the PAM examination? Providing cut score ranges based on standard errors: If it were possible to positively identify a large group of Minimally Qualified Candidates (MQCs), they could be given the test and their test scores analyzed. Since such a group already would have been identified as a cohort of minimally qualified candidates, it would be reasonable to establish a cut score based on their average score on the Part II examination. Furthermore, if all possible, qualified and diverse participants had been involved in a standard setting panel, we could determine the true cut score (i.e., the average of all possible recommendations from qualified panelists). However, practical limitations force us to use the judgments of a sample of standardsetting panelists. Because a sample of diverse panelists was selected from the full population of optometrists, there is a sampling error associated with the recommendations of the panel. These ranges offer approximate 95% confidence intervals about the median panel recommendations. The interpretation of these intervals can be stated as, 95% of similarly constructed intervals will contain the true cut score. In other words, if 100 independent panels recommended cut scores, and we calculated confidence intervals from each of these panels feedback, approximately 95 of those intervals would contain the true cut score (as described above). The width of the confidence intervals are determined by: 1) the variability of the panelist s recommendations; and 2) the size of the panel; all things being equal, a larger panel will have a smaller interval. The confidence intervals are approximately 30 points wide for each of two assigned standard setting groups and approximately 20 points wide for the full panel. We do not know which (if any) of these ranges contain the true cut scores, but in order to maximize the likelihood, a cut score was selected from the score range where these ranges overlapped (see Figure 1, below).
Utilizing a Split-panel design: In keeping with recommendations in the standard setting literature (e.g., Hambleton & Pitoniak, 2006), the relatively large panel was divided into two smaller groups, each sufficiently large to produce an acceptable level of dependability, according to recommendations in the literature. This permits the evaluation of the stability of cut score recommendations across different, but similarly qualified and representative, standard setting panels. Participants were randomly assigned to one of the two groups by Alpine Testing Solutions, and then some panelists were switched between groups in order to achieve better balance regarding important demographic variables (e.g., mode of practice, optometry school or college, length of time since graduation, etc.). Both panels received common training and worked together during the development of the MQC definition and standard-setting practice activities. The groups then were divided and worked independently with different facilitators for the operational ratings and evaluations. The demographic composition of each group and the evaluation results were reviewed to see if there was any reason to favor the recommendations of one group over the other. Ultimately, no compelling reason was identified to justify this action. Therefore, the group results were combined to determine the recommended cut score. 8. We have received aggregate statistical reports for the December administration of Part II. However, a number of students who failed the Tuesday administration of the exam will retake it in April and have their results reported as December administration scores. Will we then receive new pass rate information for first-time takers that reflects this new data? How can we consistently report our pass rate for first-timers? Candidates had several options available to them. Candidates who opted to have their December 2014 Part II PAM and TMOD scores cancelled were able to do so by contacting the NBEO by February 11, 2015. Their April 2015 scores will be reported as April 2015, not December 2014 as these scores have been cancelled from both their official score report and the database. The updated score reports and the Aggregate Statistical reports for the December 2014 Part II PAM administration have been reissued. 9. Will the students who are retaking the exam in April for free due to the exam irregularities be considered first time takers? Will they be reported as first time takers in December or in April? If a December 2014 first-time Candidate chose to have their score cancelled after receiving their results on January 21, 2015, then they will be considered first-timers in April 2015 because they will not have pre-existing Part II PAM scores.
10. Several schools have students on international rotations in April. If their scores were affected by the computer problems, will their travel be reimbursed to fly to an approved testing city or are there PearsonVue centers internationally? The NBEO has never offered international computer-based testing at any site including the Pearson VUE testing centers. The Candidates who had international externships were encouraged to take the repeat Part II PAM on January 5, 2015. For those Candidates that did not take the January 5, 2015 administration will not be provided free travel or additional compensation. 11. What were the communications sent to students and administrators about matters related to the December tests, including but not limited to retakes, scoring, actual score reporting, and reversals of TMOD scores? The Deans and Presidents were provided copies of all of these communications to the Candidates. 12. Why did some Candidate scores change from pass to fail or fail to pass after the scores were released? What steps will be taken to prevent that from happening again? No scores were changed. There were a select few passing scores that were displayed incorrectly as a failing score. The issue was related to the unofficial scores that were displayed online for the Candidates to review. The scaled score of a 74.5 was not rounded up to a 75P as our in-house scoring system is designed. Instead, those Candidates with a scaled score of 74.5 observed a 74F while viewing their unofficial scores online. Once this display rounding was verified, the NBEO IT Department amended the coding online and Candidates were notified. 13. How will the optometric community be notified of the testing problems that occurred in December? The impact on the reputations of the schools and the individual students remain a concern for future job placements, residency positions and advancement. Will this be addressed as an open statement? Through the support of ASCO in developing this FAQ sheet as well as posting this information on the NBEO website, the information will receive broad distribution. Many Candidates have chosen to retest on April 7, 2015 and to have their previous scores cancelled.
14. Does the National Board utilize the services of an external auditor to ensure that best practices are followed in the development and administration of examinations and analysis of the results? The NBEO employs a full-time, in-house psychometrician and routinely consults with the psychometric team at Alpine Testing Solutions to ensure that best practices are being followed at all times, for all National Board exams. In fact, well before the December 2014 PAM exams were administered, all psychometricians involved had agreed that the Part II PAM was due an updated standard setting in January 2015 due to the introduction of the new PAM/TMOD item types (solo items and minicases). 15. Has this recent experience slowed the plans to move Part I to a computerized administration? Have other testing centers been considered? The time table has NOT been slowed in moving to a computer-based test for Part I ABS. However, the date of this transition has not been announced because a multitude of factors must be considered in this transition, most notably the number of different forms and the number of items on each form that will be required. 16. Are any of the NBEO staff, other than the Executive Director, authorized to respond to student questions about problems with the examinations? To answer the questions from the Candidates that were affected by the December 2, 2014 administration of Part II PAM, all appropriate members of the NBEO staff frequently were briefed on what had occurred and assisted in answering questions by both email and telephone. 17. When the states require a 75% on TMOD, why are students with raw score percentages between 75% and 79.46% receiving a failing grade? No optometric jurisdiction requires a 75% to pass TMOD. The requirement since the inception of TMOD has been a scaled score of 75. In other words, the standard setting determined that the cut score was 89 raw items to pass. The 89 is converted to a 75 scaled score.