PROCESS-ORIENTED AND PRODUCT-ORIENTED ASSESSMENT OF EXPERIMENTAL SKILLS IN PHYSICS: A COMPARISON

PROCESS-ORIENTED AND PRODUCT-ORIENTED ASSESSMENT OF EXPERIMENTAL SKILLS IN PHYSICS: A COMPARISON Nico Schreiber 1, Heike Theyßen 1 and Horst Schecker 2 1 University of Duisburg-Essen 2 University of Bremen Abstract: The acquisition of experimental skills is widely regarded as an important part of science education. Models describing experimental skills usually distinguish between three dimensions of an experiment: preparation, performance and data evaluation. Valid assessment procedures for experimental skills have to consider all these three dimensions. Hands-on tests can especially account for the performance dimension. However, in large-scale assessments the analysis of students performances is usually only based on the products of the experiments. But does this test format sufficiently account for a student s ability to carry out experiments? A processoriented analysis that considers the quality of students actions, e.g. while setting up an experiment or measuring, provides a broader basis for assessments. At the same time it is more timeconsuming. In our study we compared a process-oriented and a product-oriented analysis of hands-on tests. Results show medium correlations between both analysis methods in the performance dimension and rather high correlations in the dimensions preparation and data evaluation. Keywords: experimental skills, assessment, process-oriented analysis, product-oriented analysis, science education BACKGROUND AND FRAMEWORK The acquisition of experimental skills is widely regarded as an important part of science education (e.g. AAAS 1993, NRC 2012). Thus, there is a demand for assessment tools that allow for a valid measurement of experimental skills. In our study we compared a process-oriented and a product-oriented analysis of students performances in hands-on tests. The test instrument refers to a specific model of experimental skills. Modelling experimental skills In the literature there is a broad consensus concerning typical experimental skills, like create an experimental design, set up an experiment, observe / measure and interpret the results (e.g. NRC 2012, DEE 1999). These skills can be assigned to three dimensions of an experimental investigation: preparation, performance and data evaluation. Most models of experimental skills are structured along these three dimensions with different accentuations (Klahr & Dunbar 1988, Hammann 2004, Mayer 2007, Emden & Sumfleth 2012). Our model uses the three dimensions, too. In contrast to other models, it accentuates the performance dimension (Figure 1 adapted from Schreiber, Theyßen, & Schecker 2009).

Figure 1: Model of experimental skills: 3 dimensions, 6 components (adapted from Schreiber, Theyßen & Schecker 2009). At school, an experimental question is usually given to the students and not developed by them. So students have to interpret and clarify the given task. In non-cookbook types of situations, students have to create the experimental design themselves. During the performance, students set up the experiment, they measure and document data. During data evaluation, students process data, give a solution and interpret the results. This description might suggest a linear order of steps. However, this is not intended. The steps can occur in different orders and loops. Measuring experimental skills Written tests are established instruments to assess experimental skills. But especially with regard to the performance dimension, their validity is in question (e.g. Shavelson, Ruiz-Primo, & Wiley 1999). Other approaches for the assessment of performance skills seem to be necessary (Ruiz- Primo & Shavelson 1996, Stebler, Reusser & Ramseier 1998, Garden 1999). Here, hands-on tests show their potential. However, in large-scale assessments the analysis of students performance in experimental tasks is usually only based on the products of experimenting, mostly documented in lab sheets (e.g. Stebler, Reusser, & Ramseier 1998, Ramseier, Labudde & Adamina 2011, Gut 2012). The processes of experimentation, like setting up the apparatus and making measurements, are considered only indirectly, insofar as they affect the products. On the one hand, it is in question whether a product-oriented analysis which neglects process aspects of experimenting yields adequate ratings compared to a process-oriented analysis. On the other hand, a process-oriented analysis that considers the quality of students actions in the performance dimension (e.g. Neumann 2004) is very resource-consuming. In order to justify the additional effort for a process based analysis, it has to be shown that ratings from a product-based analysis are insufficient predictors for ratings from a process-based analysis of the same hands-on test at least for the performance phase.

RATIONALE AND METHODS Hypotheses In our study we investigate correlations between ratings from a product-based and a processbased analysis of students performances in hands-on tests. In the performance dimension students get direct feedback from the experimental setup. A non-functional electric circuit e.g. may result in a series of changes, until the setup finally works. The lab sheet will only show the final result. Similar processes may occur during measurement. A product-based analysis only evaluates the documented (i. e. usually the final) results, while a process-based approach looks at the chain of students actions. Our first hypothesis is: H1: Concerning the performance dimension, ratings from a product-oriented analysis are not highly correlated with scores from a process-oriented analysis. Students prepare the experiment and evaluate their data mostly in written form, without handling experimental devices. We assume a close relationship between what they do and what they document. Thus, our second hypothesis is: H2: Concerning the preparation and evaluation of the experiment, ratings from a productoriented analysis are highly correlated with scores from a process-oriented analysis. As a high correlation we define a correlation above 0.7 (Kendall Tau-b) Methods Tasks For the comparison of the product- und process-based analysis we developed two experimental tasks for the domain of electric circuits in secondary school curricula (Schreiber et al., 2012). The first task is Here are three bulbs. Find out the one with the highest power at 6 V. In the second task the students get a set of wires and have to find the best conductor from three metals. Students have a set of apparatus and a pre-structured lab sheet at their disposal. The lab sheet is structured along our model of experimental skills, requesting to plan a suitable experiment, assemble the experimental setup, perform measurements, evaluate the data and draw conclusions. Both tasks are open-ended and the students have to structure their paths towards the solutions on their own. They are only assisted by written information on the necessary physics content knowledge. Design Table 1 shows the design of the study. It was embedded in a more extensive study concerning the comparison of different assessment tools for experimental skills (Schreiber 2012).

Table 1 Design of the study The videos and the lab sheets were analysed according to the components of experimenting shown in Figure 1. The process-oriented analysis leads to a quality index for each student in each of these assessment categories. In a first step students actions in the videotape are assigned to one of the six components. A second step of analysis codes the qualities of intermediate stages (e.g. whether an experimental setup is corpre-test cognitive skills, content knowledge, selfconcept training introducing the hands-on test 45 min 20 min Hands-on tests group 1 and task 1 highest power group 2 and task 2 best conductor 30 min 138 upper secondary students, aged about 16 to 17, took part in this study. In a pre-test we measured personal variables that are supposed to have an influence on students test performances: cognitive skills, self-concept concerning physics and experimenting in physics, and the content knowledge in the field of electricity. Established tests and questionnaires were adapted for this pre-test (Heller & Perleth 2000, Engelhardt & Beichner 2004, von Rhöneck 1988, Brell 2008). In the hands-on test the students worked on one of the two tasks described above. The use of two different tasks was due to the design of the more extensive project into which this study was embedded. The students were assigned to the two groups based on their pre-test results in such a way that a sufficient and similar variance of the personal variables was realized in both groups. In a training session the students were introduced to the hands-on test (structures of the tasks and handling of the devices). The training task was also taken from the domain of electric circuits (measuring the current-voltage characteristic of a bulb). In the hands-on test, students worked with a set of electric devices and a pre-structured lab sheet (Figure 2, task 1). In the situation shown in Figure 2, the student documents his (inadequate) setup with two multimeters, a battery and a bulb in the lab sheet. The pre-structured lab sheet demands to clarify the question, to document the setup, to perform measurements and to interpret the results. Students can choose when and in which order they fill in the sheet. The lab sheet does not specify a particular solution or approach. Students actions were videotaped and the lab sheets were collected. Figure 2: A student performed task 1 with the hands-on test. Process-oriented analysis

rect, imperfect or wrong) and the development (e.g. whether an imperfect setup is detected and improved) (cf. Schreiber, Theyßen & Schecker 2012, Theyßen et al. 2013). The flow chart in Figure 3 illustrates an example of how the rating decisions are made. The result is a quality index on an ordinal scale with five levels. To secure validity and reliability of this analysis, several studies with high inferent expert-ratings and interviews were conducted (details: Dickmann 2009, Holländer 2009, Fichtner 2011, Dickmann, Schreiber & Theyßen 2012, Schreiber 2012). The evaluation of double coding yields a high objectivity of the ratings (Cohen s Kappa.67). Figure 3: Formal analysis scheme of the sequence analysis specified for setup skills. Product-oriented analysis For the product-oriented analysis only the students documentations in the lab sheets were analysed with regard to the same six model components (skill in Figure 1). Each entry in the lab sheet is directly associated with an assessment category. The single criterion is the correctness of the entry. A development cannot be assessed since in most cases only one result is documented in the sheets. Thus, using the formal analysis scheme (Figure 3), in the product-oriented analysis only the levels 1, 2, and 5 can be scored. Again the objectivity in each assessment category is satisfying (Cohen s Kappa >.62). RESULTS To test the hypotheses, rank correlations (Kendall-Tau b, ) between the quality parameters from the product-oriented and the process-oriented analysis were calculated for each category (Tab. 2). In all the four assessment categories that can be assigned to the preparation and the evaluation dimensions, the correlations are high (. For components of the performance dimension, we found only medium or low correlations (. Thus, both hypotheses can be confirmed. The high correlations in the planning and data evaluation dimensions can be explained by the data basis: In these dimensions the process-oriented analysis also refers mainly to the documentations in the lab sheets. Only in a few cases the videos provided further information concerning

Table 2 Correlations (Kendall-Tau b, ) between a product-oriented and a process-oriented analysis. The correlations are highly significant ( ** ) or significant ( * ). The assessment categories are assigned to the three experimental dimensions: preparation, performance and data evaluation. n: sample size. dimension assessment categories n preparation apprehend the task.877 ** 138 create the experimental design.728 ** 138 performance set up the experiment.499 ** 130 perform & document measurements.221 ** 122 data evaluation process the data & give a solution.960 ** 122 interpret the results.775 ** 117 developments. Thus, regardless of the method of analysis, in the dimensions of planning and data evaluation the scores 1, 2 and 5 dominate. In contrast, in both assessment categories belonging to the performance dimension the process-oriented analysis largely profits from the videos. The videos provide relevant information about the developments while the students work on the set up and make measurements. The low correlations between the process-oriented and the product-oriented analysis are obviously caused by an information gap between the documented setups and measurements on the one hand and the actual setupand measurement-processes in the video on the other hand. A further result can be derived from Table 2: the sample size per category decreases over the course of the experiment. Whereas 138 students clarified the questions and created an experimental design in the beginning, only 117 students interpreted the results in the end. This is a noticeable dropout of about 15 %. The reason is the use of an open task format (Fig. 2). The students had to structure the approach on their own and without any assistance. Students who e.g. did not complete the setup were in the following not able to measure and to document data. CONCLUSIONS We draw two conclusions from our results: 1. Comparison of a process-oriented and a product-oriented analysis A product-oriented analysis seems to be sufficient to analyse students skills of preparing an experiment and evaluating data. But in order to account for performance skills adequately, hands-on tests with a process-oriented analysis of students actions seem to be necessary. These findings should be considered for the development of more valid assessment procedures. 2. Open task format and sample size The use of an open task format in testing experimental skills ( Find out ) causes a noticeable dropout of students during the test. For assessing the full range of experimental skills, we suggest a guided test with non-interdependent sub-tasks. Each sub-task should refer to a specific experimental skill. To allow for a non-interdependent assessment, the item should present a sample solution of the preceding step, e.g. a measurement-item should provide a complete experimental setup. We have started to work on such a test format.

REFERENCES American Association for the Advancement of Science (AAAS) (Ed.) (1993). Benchmarks for Science Literacy. New York: Oxford University Press. Brell, C. (2008). Lernmedien und Lernerfolg - reale und virtuelle Materialien im Physikunterricht. Empirische Untersuchungen in achten Klassen an Gymnasien zum Laboreinsatz mit Simulationen und IBE. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studien zum Physik- und Chemielernen, Vol. 74. Berlin: Logos. Department for Education and Employment (DEE) (Ed.) (1999). Science - The National Curriculum for England. London: Department for Education and Employment. Dickmann, M. (2009). Validierung eines computergestützten Experimentaltests zur Diagnostik experimenteller Kompetenz (unpublished bachelor thesis). Dortmund: Technische Universität Dortmund. Dickmann, M., Schreiber, N. & Theyßen, H. (2012). Vergleich prozessorientierter Auswertungsverfahren für Experimentaltests. In S. Bernholt (Ed.), Konzepte fachdidaktischer Strukturierung für den Unterricht, (pp. 449 451). Münster: LIT. Emden, M. & Sumfleth, E. (2012). Prozessorientierte Leistungsbewertung des experimentellen Arbeitens. Zur Eignung einer Protokollmethode zur Bewertung von Experimentierprozessen. Der mathematische und naturwissenschaftliche Unterricht (MNU), 65 (2), 68-75. Engelhardt, P. V. & Beichner, R. J. (2004). Students understanding of direct current resistive electrical circuits. American Journal of Physics 72 (1), 98 115. Fichtner, A. (2011). Validierung eines schriftlichen Tests zur Experimentierfähigkeit von Schülern (unpublished master thesis). Bremen: Universität Bremen. Garden, R. (1999). Development of TIMSS Performance Assessment Tasks. Studies in Educational Evaluation 25(3), 217 241. Gut, C. (2012). Modellierung und Messung experimenteller Kompetenz. Analyse eines largescale Experimentiertests. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studien zum Physik- und Chemielernen, Vol. 134. Berlin: Logos. Hammann, M. (2004). Kompetenzentwicklungsmodelle: Merkmale und ihre Bedeutung - dargestellt anhand von Kompetenzen beim Experimentieren. Der mathematische und naturwissenschaftliche Unterricht 57(4), 196 203. Heller, K. A. & Perleth, C. (2000). Kognitiver Fähigkeitstest für 4.-12. Klassen, Revision (KFT 4-12+ R). Göttingen: Hogrefe. Holländer, L. K. (2009). Validierung eines Experimentaltests mit Realexperimenten zur Diagnostik experimenteller Kompetenz (unpublished bachelor thesis). Dortmund: Technische Universität Dortmund. Klahr, D. & Dunbar, K. (1988). Dual Space Search During Scientific Reasoning. Cognitive Science 12, 1 48. Neumann, K. (2004). Didaktische Rekonstruktion eines physikalischen Praktikums für Physiker. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studien zum Physik- und Chemielernen, Vol. 38. Berlin: Logos.

Mayer, J. (2007). Erkenntnisgewinnung als wissenschaftliches Problemlösen. In D. Krüger & H. Vogt (Eds.), Theorien in der biologiedidaktischen Forschung (pp. 177-186). Berlin, Heidelberg: Springer. National Research Council (NRC) (Ed.) (2012). A Framework for K-12 Science Education: Practices, Crosscutting Concepts, and Core Ideas. Washington, DC: The National Academies Press. Ramseier, E., Labudde, P. & Adamina, M. (2011). Validierung des Kompetenzmodells HarmoS Naturwissenschaften: Fazite und Defizite. Zeitschrift für Didaktik der Naturwissenschaften, 17, 7 33. Ruiz-Primo, M. A. & Shavelson, R. J. (1996). Rhetoric and Reality in Science Performance Assessments: An Update. Journal of Research in Science Teaching 33 (10), 1045 1063. Schreiber, N. (2012). Diagnostik experimenteller Kompetenz - Validierung technologiegestützter Testverfahren im Rahmen eines Kompetenzstrukturmodells. In H. Niedderer, H. Fischler & E. Sumfleth (Eds.), Studien zum Physik- und Chemielernen, Vol. 139. Berlin: Logos. Schreiber, N., Theyßen, H., Schecker, H. (2009). Experimentelle Kompetenz messen?! Physik und Didaktik in Schule und Hochschule, 8 (3), 92-101. Schreiber, N., Theyßen, H. & Schecker, H. (2012). Experimental Competencies In Science: A Comparison Of Assessment Tools. In C. Bruguière, A. Tiberghien & P. Clément (Eds.), E- Book Proceedings of the ESERA 2011 Conference, Lyon France. Retrieved from: http://www.esera.org/media/ebook/strand10/ebook-esera2011_schreiber-10.pdf (29.11.2013). Shavelson, R. J., Ruiz-Primo, M. A. & Wiley, E. W. (1999). Note on Sources of Sampling Variability in Science Performance Assessments. Journal of Educational Measurement, 36 (1), 61-71. Stebler, R., Reusser, K. & Ramseier, E. (1998). Praktische Anwendungsaufgaben zur integrierten Förderung formaler und materialer Kompetenzen - Erträge aus dem TIMSS- Experimentiertest. Bildungsforschung und Bildungspraxis 20 (1), 28 54. Theyßen, H., Schecker, H., Gut, C., Hopf, M., Kuhn, J., Labudde, P., Müller, A., Schreiber, N., Vogt, P. (2013). Modelling and Assessing Experimental Competencies in Physics. In C. Bruguière, A. Tiberghien & P. Clément (Eds.), 9th ESERA Conference Contributions: Topics and trends in current science education - Contributions from Science Education Research (pp. 321 337). Dordrecht: Springer. von Rhöneck, C. (1988). Aufgaben zum Spannungsbegriff. Naturwissenschaften im Unterricht - Physik/Chemie, 36(31), 38 41.