PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Fogelholm, Mikael University of Helsinki 19-Jul-2015



Similar documents
PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Elizabeth Comino Centre fo Primary Health Care and Equity 12-Aug-2015

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Carrieri, Vicenzo University of Salerno 19-Jun-2013

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Nahyuha Chomi, Eunice United Kingdom 03-Jul-2015

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Anna Sinaiko Harvard School of Public Health United States of America 20-Dec-2013

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Saket Girotra University of Iowa, Iowa City, IA United States 04-Aug-2015

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

How to Develop a Research Protocol

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Dr Andreas Xyrichis King's College London, UK 14-Jun-2015

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Tatyana A Shamliyan. I do not have COI. 30-May-2012

PEER REVIEW HISTORY ARTICLE DETAILS

A Comparison of Training & Scoring in Distributed & Regional Contexts Writing

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Avinesh Pillai Department of Statistics University of Auckland New Zealand 16-Jul-2015

THESIS CHAPTER WRITING RUBRICS

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Constructive student feedback: Online vs. traditional course evaluations. Judy Donovan, Ed.D. Indiana University Northwest

Statistics Review PSY379

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Running head: SAMPLE FOR STUDENTS 1. Sample APA Paper for Students Interested in Learning APA Style 6th Edition. Jeffrey H. Kahn

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

NORTHERN VIRGINIA COMMUNITY COLLEGE PSYCHOLOGY RESEARCH METHODOLOGY FOR THE BEHAVIORAL SCIENCES Dr. Rosalyn M.

The following Synthesis describes the reports that were individually submitted by each of the external panel members and does not represent consensus

Basic Concepts in Research and Data Analysis

CIHR Reviewers Guide for New Investigator Salary Awards

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW

For a more detailed list of definitions please refer to Instructional Design in Elearning

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

DATA ANALYSIS. QEM Network HBCU-UP Fundamentals of Education Research Workshop Gerunda B. Hughes, Ph.D. Howard University

STUDY OF RELIGION. Year 11. Task 1 Scaffold

Implementation Committee for Gender Based Salary Adjustments (as identified in the Pay Equity Report, 2005)

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Guidance for Peer Reviewers. The Journal of the American Osteopathic Association (JAOA)

CHAPTER 14 NONPARAMETRIC TESTS

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Higher Degree by Research Thesis Presentation - Guidelines

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Drawing Inferences about Instructors: The Inter-Class Reliability of Student Ratings of Instruction

Skewed Data and Non-parametric Methods

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Josefine Persson Institution of Neuroscience and physiology, the Sahlgrenska academy at Gothenburg university, Sweden. 26-May-2015

Analysis of Variance ANOVA

Writing Thesis Defense Papers

A Capability Model for Business Analytics: Part 3 Using the Capability Assessment

CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE

Response to Critiques of Mortgage Discrimination and FHA Loan Performance

Data Analysis Tools. Tools for Summarizing Data

BEFORE-DURING-AFTER (BDA)

HOW TO WRITE A LABORATORY REPORT

Hispanic or Latino Student Success in Online Schools

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Virtual Teaching in Higher Education: The New Intellectual Superhighway or Just Another Traffic Jam?

Using Value Added Models to Evaluate Teacher Preparation Programs

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Implementation Committee for Gender Based Salary Adjustments (as identified in the Pay Equity Report, 2005)

Likert Scales. are the meaning of life: Dane Bertram

Software-assisted document review: An ROI your GC can appreciate. kpmg.com

Proposal Writing: The Business of Science By Wendy Sanders

Compass Interdisciplinary Virtual Conference Oct 2009

Practical Research. Paul D. Leedy Jeanne Ellis Ormrod. Planning and Design. Tenth Edition

IFAC EDUCATION COMMITTEE FRAMEWORK FOR INTERNATIONAL EDUCATION STATEMENTS CONTENTS

Best Practices In Using Child Assessments: Lessons from NAEYC Program Standards

THE COLLEGE READY PROMISE ADVISORY PANEL MEETING

UNIVERSITY OF READING

Assessment of Core Courses and Factors that Influence the Choice of a Major: A Major Bias?

Organizing Your Approach to a Data Analysis

Descriptive Statistics and Measurement Scales

PEER REVIEW HISTORY ARTICLE DETAILS VERSION 1 - REVIEW. Richard Franklin James Cook University, Australia 15-Jun-2015.

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

TELECONFERENCE. March 31, :04 pm CT

Writing the Empirical Social Science Research Paper: A Guide for the Perplexed. Josh Pasek. University of Michigan.

The P s and Q s of Performance Reviews PARTICIPANT WORKBOOK

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender,

PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

Evaluating Distance Learning in Graduate Programs: Ensuring Rigorous, Rewarding Professional Education

Title: Transforming a traditional lecture-based course to online and hybrid models of learning

Lessons Learned International Evaluation

An Analysis of IDEA Student Ratings of Instruction in Traditional Versus Online Courses Data

A. Scoring and Critiques: Are review criteria weighted? Approach appears to be weighed more heavily than other criteria, like innovation and impact.

Power Analysis for Correlation & Multiple Regression

Additional sources Compilation of sources:

Introduction: Reading and writing; talking and thinking

Foundation of Quantitative Data Analysis

Identifying Market Price Levels using Differential Evolution

Schools Value-added Information System Technical Manual

Plotting Data with Microsoft Excel

JOURNAL OF BUSINESS AND PSYCHOLOGY SPECIAL TRIAL INITIATIVE ANNOUNCEMENT

Chapter 4 SUPPLY CHAIN PERFORMANCE MEASUREMENT USING ANALYTIC HIERARCHY PROCESS METHODOLOGY

Infographics in the Classroom: Using Data Visualization to Engage in Scientific Practices

Tutor Response Form. (Your marked-up essay is below this form.)

Module 5: Statistical Analysis

Keys to Writing Successful NIH Research and Career Development Grant Applications

Chi Square Tests. Chapter Introduction

Transcription:

PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf) and are provided with free text boxes to elaborate on their assessment. These free text comments are reproduced below. ARTICLE DETAILS TITLE (PROVISIONAL) AUTHORS A Retrospective Analysis of the Effect of Discussion in Teleconference and Face-to-Face Scientific Peer Review Panels Carpenter, Afton; Sullivan, Joanne; Deshmukh, Arati; Glisson, Scott; Gallo, Stephen VERSION 1 - REVIEW REVIEWER REVIEW RETURNED Fogelholm, Mikael University of Helsinki 19-Jul-2015 GENERAL COMMENTS This is a very interesting paper on panel peer review. The results should be important and meaningful for all research grant organizations. Although the paper is based on non-randomized data with slightly different procedures on different years, the authors have done everything they could to make the evaluations 2009 10 vs. 2011 12 as comparable as possible. Let me start by summarizing how I understood the peer review process was carried out. If I have misunderstood something, the authors may want to check the description. Out of the 7 12 panel members, two are nominated as assigned reviewers, one as primary and another as secondary reviewer (I didn t really understand if there was a meaningful difference between the primary and secondary reviewer, other than one is simply called primary and the other secondary). They read the paper before the meeting and they also give a preliminary score. Before the meeting, the panel members give scores only to the assigned papers. I was uncertain if the panelists the preliminary scores given by the assigned reviewers. Then the panel meets, face-to-face or by teleconference. There is a discussion and after this all panelists (without a COI) scores the paper. The mean value of all panelists scores is also the final scoring of the paper. I have only minor comments on the manuscript: 1) One issue I was missing was an analysis of the timing of discussion. There are some data suggesting that, e.g., morning discussions could be longer and more thorough, while in the afternoon the panelists become tired and hence they might reach consensus faster simply because panelists want to finish the day. However, perhaps these data were not available. 2) In the abstract, the phrase important for at least 10% of the applications (line 32) is used. What does important really mean

here? If this refers to about 10% of the applications being shifted from potentially non-fundable to potentially fundable, this change is certainly important and pleasant for the grant applicants. However, a shift from potentially fundable to potentially non-fundable is equally important, albeit really unpleasant for the applicant. Perhaps another word could be more suitable. the authors may also want to consider both moves as interesting and meaningful. 3) In table 1, I could not understand why the letter MF are used to indicate the differences between the post-discussion scores by the assigned reviewers. 4) Table 2 is a little difficult to understand without reading the text. Perhaps the legend could explain a little more. 5) The difference in average discussion time between face-to-face and teleconference settings was not very large, in my mind. Is this something which warrants a comment? REVIEWER REVIEW RETURNED Mutz, Ruediger ETH 21-Jul-2015 GENERAL COMMENTS The manuscript reports results of a retrospective study, which aims to examine the effects of discussion in face-to-face versus teleconference settings of grant peer review panels, measured by changes in application scoring pre-meeting and post-discussion. 260 and 212 applications for the face-to-face and teleconference settings, respectively, were included in the analysis. Overall, only small differences between the two panel models were found. In my view the manuscript would merit a publication in BMJ, but the final recommendation for publication depends on the revision, which should address the following aspects: - Missing concept: The study examined differences between face-toface and teleconference settings in grant peer review panels, especially the differential impact of discussions on final scores. In the beginning, however, it was not sufficiently explained, what the differences between these two modes of grant peer review panels really are, and in which way these differences might moderate the effect of discussion on final post-meeting scores. A theoretical concept is missing, in order to get an idea, where and in what amount differences in measured variables, e.g., evaluation scores, could be presumed. Only in the final discussion few results from research on teleconference settings were added. An alternative to a theoretical concept could be the bias-concept in peer review research: The mode of peer review should not affect the scores (Null hypothesis: No difference). In my view, the revision should more elaborate the two different settings of grant peer review panels, for instance, by either adopting concepts from research on teleconference settings or from the bias research in grant peer review. - Causal inference: Eventually, the reported study examined group differences, especially the different impact of discussion in two different grant peer review settings. To make such causal inference the similarity between the two groups must be guaranteed (ceteris

paribus condition). The manuscript mentioned that the reviewers in the face-to-face and the teleconference settings were similar regarding to their demographics, but without any statistical tables. It is usual in statistics to report some sample characteristics, if not already in the manuscript, then at least in the supporting information. Second, the groups also might differ with respect to the properties of the applications. Nothing was said in the manuscript about prediscussion differences in grant applications of the two modes of grant peer review panel. By calculating difference scores the problem of absolute differences in pre-meeting scores might vanish, but differences in other properties might still remain. For example, the two groups might differ in the interrater-reliability of referees` ratings of a grant application. If the inter-rater reliability was high, the application would be less contentious, larger score shifts following discussion could be presumed. Thus, the properties of the grant applications have provoked the difference, instead of the discussion or the panel setting. I think the revision should make clear, that also the grant applications were quite similar in selected properties with respect to the two settings of grant peer review panels (face-to-face, teleconference setting). - Inter-rater reliability: In grant peer review research it is also usual to report the inter-rater reliability, overall and separated for the two settings of grant peer review panel. - Statistics: I suppose that a paired-samples t-test was used to test for differences in means between pre-meeting scores and postdiscussion scores. In the revision it should be clarified, which teststatistic was used. - Results: The result part is somewhat difficult to understand. Certain numbers in the text cannot be replicated with numbers in the tables. Whereas, for instance, the number 38.8% of primary reviewer scores (p.6), which did not change, can be replicated in Table 2 (last row), the number of 18.5% of scores, shifted to a better score seems to have no counterpart in Table 2, although a reference to Table 2 was made after the sentence Examining Apri showed that 38.8%... (p.6, last section). It would be helpful, if exemplarily a row of table 2a/2b can be explained in the text including the explanation of the categories. Further, in face of the absolute values presented in Table 2 it is not clear for me how to come up with a statement about something that gets worse: However, if reviewers did change their score, both primary and secondary scores were more likely become poorer.... Minor: - What is meant by primary and secondary pre-meeting scores? - Primary and secondary reviewer scores should not only mentioned on page 4 for the pre-meeting scores, but also for the postdiscussion scores, as they were used in the section Approach and Table 1. VERSION 1 AUTHOR RESPONSE Reviewer 1 1) I didn t really understand if there was a meaningful difference between the primary and secondary reviewer, other than one is simply called primary and the other secondary.

We ve added some details under the Peer Review section of the Methods that should help clarify the differences between a primary and secondary reviewer. 2) One issue I was missing was an analysis of the timing of discussion. There are some data suggesting that, e.g., morning discussions could be longer and more thorough, while in the afternoon the panelists become tired and hence they might reach consensus faster simply because panelists want to finish the day. However, perhaps these data were not available. We appreciate this interesting comment. It should be noted that morning/afternoon discussions are not as easy to discern for teleconference panels as they are for face-to-face panels because often teleconference panels meet at varying hours in the day. However, we did look into a bulk summary average of morning/afternoon (face-to-face) & first half of the day/second half of the day (teleconference) discussions for both settings. There was no real discernable difference observed. We have included a brief reference about this in the Application discussion time section of the Results. 3) In the abstract, the phrase important for at least 10% of the applications (line 32) is used. What does important really mean here? If this refers to about 10% of the applications being shifted from potentially non-fundable to potentially fundable, this change is certainly important and pleasant for the grant applicants. However, a shift from potentially fundable to potentially non-fundable is equally important, albeit really unpleasant for the applicant. Perhaps another word could be more suitable. the authors may also want to consider both moves as interesting and meaningful. Thank you for pointing this out. In fact, as the reviewer indicated, moving in either direction over the funding line is meaningful and what we were referring to. We ve clarified this statement in the abstract. 4) In table 1, I could not understand why the letter MF are used to indicate the differences between the post-discussion scores by the assigned reviewers. We ve changed MF to PD (post-discussion), which we hope is a bit more clear. 5) Table 2 is a little difficult to understand without reading the text. Perhaps the legend could explain a little more. We ve expanded the legend for Table 2. We also added in a few sentences explaining the table. 6) The difference in average discussion time between face-to-face and teleconference settings was not very large, in my mind. Is this something which warrants a comment? Based on our findings offered in this paper, the difference in discussion time appears to not be an important factor when it comes to looking at reviewer contentiousness or the effect of discussion. However, it is consistent with our previous PLOS ONE findings that teleconference panels, in general, have shorter discussion times. We ve included a sentence under the Application discussion time section of the Results that addresses this. Reviewer 2 1) Missing concept: The study examined differences between face-to-face and teleconference settings in grant peer review panels, especially the differential impact of discussions on final scores. In the beginning, however, it was not sufficiently explained, what the differences between these two modes of grant peer review panels really are

We ve included a paragraph, as well as a reference (Zheng et al), in the Introduction & Background section that addresses the major differences between the two settings. A crucial difference than can often be overlooked is the development of trust among panel members. This is fostered in face-toface meetings through shared experiences, visual social cues, and even socializing during panel breaks. These opportunities are reduced in teleconference panels. 2) Causal inference: Eventually, the reported study examined group differences, especially the different impact of discussion in two different grant peer review settings. To make such causal inference the similarity between the two groups must be guaranteed (ceteris paribus condition). The manuscript mentioned that the reviewers in the face-to-face and the teleconference settings were similar regarding to their demographics We have provided a summary in the manuscript in the Peer Review section of the Methods that includes information on reviewer demographics, including reviewer rank and degree. 3) Inter-rater reliability: In grant peer review research it is also usual to report the inter-rater reliability, overall and separated for the two settings of grant peer review panel. We ve included information on the ICC for each year (pre-meeting and post-discussion) as well as for each review setting in a supplemental table (Table S1) and also refer to the table in the text under Application score shifts. Regardless of setting, our results demonstrate that, as would be expected, there is a higher reliability between the assigned reviewer scores following discussion. 4) Statistics: I suppose that a paired-samples t-test was used to test for differences in means between pre-meeting scores and post-discussion scores. In the revision it should be clarified, which teststatistic was used. We actually did not originally perform a paired t-test on the primary and secondary reviewer premeeting and post-discussion scores. However, we have gone back and performed paired t-test on the primary and secondary reviewer scores for both settings. Our findings are included in the Application score shifts section of the Results, alongside the ICC. When looking at differences between ΔPRI, ΔSEC, and ΔA between settings, we utilized unpaired t-tests of unequal variance as paired t-tests are not possible for these groups. 5) Results: The result part is somewhat difficult to understand. Certain numbers in the text cannot be replicated with numbers in the tables We ve included a few sentences in the text near Table 2 explaining Table 2 as well as Table S2 and S3. 6) What is meant by primary and secondary pre-meeting scores? We ve included some clarifying information on the pre-meeting scores in the Peer Review section of the Methods. 7) Primary and secondary reviewer scores should not only mentioned on page 4 for the pre-meeting scores, but also for the post-discussion scores, as they were used in the section Approach and Table 1. The primary and secondary post-discussion (PD) scores are investigated via ΔPD (originally ΔMF). Specifically, ΔPD is examined under the Contentiousness and effect of discussion section. However, we have added a sentence that demonstrates what the median ΔPD was for each setting.

VERSION 2 REVIEW REVIEWER REVIEW RETURNED Fogelholm, Mikael University of Helsinki 12-Aug-2015 GENERAL COMMENTS The reviewer completed the checklist but made no further comments.