Validation of the NSSE Benchmarks and Deep Approaches to Learning. Against Liberal Arts Outcomes. Ernest T. Pascarella and Tricia A.

Similar documents

Guide to Outcome Measures

Methodological Report for Wabash National Study of Liberal Arts Education. Ernest T. Pascarella and colleagues. University of Iowa

How the instructional and learning environments of liberal arts colleges enhance cognitive development

Impacts of Good Practices on Cognitive Development, Learning Orientations, and Graduate Degree Plans During the First Year of College

Effects of Diversity Experiences on Critical Thinking Skills: Who Benefits?

Running head: HIGH-IMPACT PRACTICES 1. Do High-Impact Practices Actually Have High Impact on Student Learning? Some Initial. Findings.* Cindy A.

How To Find Out If A College Is More Successful

Longitudinal Research 1

Running head: ABANDONING THE LIBERAL ARTS? 1

Accessed 27 Aug :39 GMT

The Disengaged Commuter Student: Fact or Fiction? are less committed to academic pursuits compared with their counterparts who go away to

NSSE S BENCHMARKS ONE SIZE FITS ALL?

First-Generation College Students: How Co-Curricular Involvement Can Assist with Success. Valerie Garcia

First-Generation College Students

I conducted my observation at the University of North Georgia, a four year public

Independent Colleges and Student Engagement:

Graduate Student Perceptions of the Use of Online Course Tools to Support Engagement

High-Impact Practices and Experiences from the Wabash National Study

ABET - Criteria for Accrediting Engineering Programs Mapped to NSSE Survey Questions

The Impact of First- Year Seminars on College Students Life-long Learning Orientations

The Role of Perceptions of Remediation on the Persistence of Developmental Students in Higher Education

ACT CAAP Critical Thinking Test Results

The College Student Expectations Survey (CSXQ) Results of an Administration to Entering Freshmen at California State University, Fresno Fall 2004

Improving the Educational Quality of Liberal Arts Colleges by Charles F. Blaich and Kathleen Wise February 2014

The Academic and Co-Curricular Experiences of UCSD Freshmen Students Evidence from the Your First College Year Survey

Required for Review: Ask and answer an assessment-focused question

Running head: ENGAGEMENT OF STUDENTS WITH INTELLECTUAL DISABILITIES. Engagement among Students with Intellectual Disabilities and

Basic Proficiencies in Critical Thinking

Access Provided by University Of College Park at 09/05/11 9:30PM GMT

Measuring Quality 1. Measuring Quality: A Comparison of U. S. News Rankings and NSSE Benchmarks

The Role of Active Learning in College Student Persistence

In what level of qualification are you enrolled? Where has your study been mainly based in the current academic year?

National Research and Trends on Senior Capstone Experiences

Student Experiences with Information Technology and their Relationship to Other Aspects of Student Engagement

What Student Engagement Data Tell Us about College Readiness

Community College Survey of Student Engagement

The University of Southern Mississippi College Portrait The University of Southern Mississippi Hattiesburg, MS

National Standards. Council for Standards in Human Service Education (2010, 1980, 2005, 2009)

National Standards. Council for Standards in Human Service Education (2010, 1980, 2005, 2009)

Results of the Spring, 2011 Administration of the National Survey of Student Engagement at New Mexico Highlands University

Engaging Online Learners: A Quantitative Study of Postsecondary Student Engagement in the Online Learning Environment

The Wisconsin Comprehensive School Counseling Model Student Content Standards. Student Content Standards

UNH Graduate Education Department. Quarterly Assessment Report

Student Experiences with Diversity at Liberal Arts Colleges: Another Claim for Distinctiveness

California State University, Los Angeles College Portrait. The Cal State LA Community. Carnegie Classification of Institutional Characteristics

Graduate Student HANDBOOK. Rehabilitation Counseling Program

School Psychology Program Goals, Objectives, & Competencies

Metropolitan State University of Denver Master of Social Work Program Field Evaluation

The Broncho Community. Carnegie Classification of Institutional Characteristics

Psychology. Academic Requirements. Academic Requirements. Career Opportunities. Minor. Major. Mount Mercy University 1

PERFORMANCE STANDARDS FOR ADVANCED MASTERS PROGRAMS CURRICULUM STUDIES

Student Perceptions of Effective Instruction and the Development of Critical Thinking: A Replication and Extension* Chad N. Loes

GEORGE D. KUH AND ROBERT M. GONYEA

What General Education Courses Contribute to Essential Learning Outcomes

The Impact of Faculty Teaching Practices on the Development of Students Critical Thinking Skills

Civic Learning and Engagement: A Review of the Literature on Civic Learning, Assessment, and Instruments. Robert D. Reason & Kevin Hemer

Doctor of Philosophy in Counseling Psychology

Summary. Introduction

Westminster Campus Nursing Program Curriculum Organizing Framework

A C T R esearcli R e p o rt S eries Using ACT Assessment Scores to Set Benchmarks for College Readiness. IJeff Allen.

Teacher Evaluation. Missouri s Educator Evaluation System

Ph. D. Program in Education Specialization: Educational Leadership School of Education College of Human Sciences Iowa State University

College of Psychology and Humanistic Studies (PHS) Curriculum Learning Goals and PsyD Program Learning Goals, Objectives and Competencies (GOCs)

Faculty Do Matter: The Role of College Faculty in Student Learning and Engagement

Behavioral Sciences INDIVIDUAL PROGRAM INFORMATION Macomb1 ( )

Panther Camp Florida International University

DEPARTMENT OF NURSING C.W.POST CAMPUS LONG ISLAND UNIVERSITY UNDERGRADUATE STUDENT HANDBOOK

Social Work Field Education Core Competencies and Practice Behaviors

Metropolitan State University of Denver Master of Social Work Program Field Evaluation

Master of Arts, Counseling Psychology Course Descriptions

MA in Sociology. Assessment Plan*

GMAC. Predicting Success in Graduate Management Doctoral Programs

Guide to Using Results

In the past two decades, the federal government has dramatically

School Counselor (501)

School of Accounting Florida International University Strategic Plan

Relating the ACT Indicator Understanding Complex Texts to College Course Grades

Ed.S. School Psychology Program Guidebook

IPP Learning Outcomes Report. Faculty member completing template: Greg Kim Ju, Marya Endriga (Date: 1/17/12)

Community College Survey of Student Engagement

Online Master's Degree in LITERACY STUDIES

GRADUATE PROGRAM CURRICULUM

GRADUATE PROGRAM CURRICULUM

PHD & M.PHIL Research Postgraduate Programmes CUHK FACULTY OF EDUCATION

Master of Education School Counseling Degree Program

Graduation Requirements

What Matters in College After College

Doctorate of Education Ethical Leadership

AACSB Business Mapped to NSSE 2015 Survey Items

Master of Arts in Higher Education (both concentrations)

Standards for the School Counselor [23.110]

GRADUATE PROGRAM CURRICULUM

This historical document is derived from a 1990 APA presidential task force (revised in 1997).

DOCTOR OF PHILOSOPHY DEGREE. Educational Leadership Doctor of Philosophy Degree Major Course Requirements. EDU721 (3.

AACSB Accounting Mapped to NSSE 2015 Survey Items

Phone: Program Coordinator: Dr. Robert Kersting, ACSW, Ph.D., DCSW, MSW

WESTERN MICHIGAN UNIVERISTY SCHOOL OF SOCIAL WORK Field Education Learning Contract and Evaluation. MSW Concentration-Interpersonal Practice (IP)

The Association to Advance Collegiate Schools of Business (AACSB) Business Accreditation (BA) Standards

Transcription:

NSSE Validation RUNNING HEAD: NSSE Validation Validation of the NSSE Benchmarks and Deep Approaches to Learning Against Liberal Arts Outcomes Ernest T. Pascarella and Tricia A. Seifert The University of Iowa Charles Blaich Center of Inquiry in the Liberal Arts at Wabash College Please direct questions regarding this paper to: Ernest T. Pascarella, The University of Iowa, College of Education, N491 Lindquist Center, Iowa City, IA 52242, ernest-pascarella@uiowa.edu This research was supported by a generous grant from the Center of Inquiry in the Liberal Arts at Wabash College to the Center for Research on Undergraduate Education at The University of Iowa.

NSSE Validation 2 ABSTRACT This study examines the validity of the National Study of Student Engagement (NSSE) measures of good practices in predicting outcomes associated with a liberal arts education. Using a multi-institutional sample and a pretest-posttest longitudinal design, we find that net of student background characteristics, the type of institution attended, and other college experiences, one or more of the NSSE measures of good practices in undergraduate education consistently predicted first year development in students effective reasoning and problem solving, well-being, inclination to inquire and lifelong learning, intercultural effectiveness, leadership, moral character, and integration of learning. We find that, with institutions as the unit of analysis, the NSSE measures of good practices had a number of substantial positive associations with liberal arts outcomes that persist even in the presence of controls for a precollege measure of each outcome. The findings lend support for the assumption that the widely used NSSE scales are, in fact, measuring practices that positively influence undergraduate cognitive and personal development across a broad range of outcomes.

NSSE Validation 3 Since its inception, the National Survey of Student Engagement (NSSE) has clearly become one of the most broad-based annual surveys of undergraduates in the country. According to the NSSE 2007 Annual Report ( Experiences That Matter: Enhancing Student Learning and Success, 2008), the NSSE survey has been completed by nearly 1.5 million students at nearly 1,200 different colleges and universities in the last decade. In 2008 alone, 774 different colleges and universities are participating in the annual spring administration of the 15-20 minute survey. The NSSE is specifically designed to measure the extent to which college students are engaged in empirically-vetted good practices in undergraduate education. Indeed, one of the major assumptions of the NSSE is that in measuring such good practices, one is essentially measuring experiences which yield desired student cognitive and personal development during college. Thus, other things being equal, the greater one s engagement in, or exposure to these good practices the more developmentally influential one s undergraduate education or so the logic goes. In this paper, we analyze longitudinal data from the Wabash National Study of Liberal Arts Education to validate NSSE measures of good practices in undergraduate education against a range of liberal arts outcomes. We find that net of student background characteristics, the type of institution attended, and other college experiences, one or more of the NSSE measures of good practices in undergraduate education consistently predicted first year development in students effective reasoning and problem solving, well-being, inclination to inquire and lifelong learning, intercultural effectiveness, leadership, and moral character. We find that, with institutions as the unit of analysis, the NSSE measures of good practices had a number of substantial positive associations with liberal arts outcomes that persist even in the presence of controls for a precollege measure of each outcome. With more extensive controls in place at the

NSSE Validation 4 individual student level of analysis, each of the liberal arts outcomes considered was significantly, if modestly, influenced by at least one NSSE good practice scale. The findings lend a modicum of support for the assumption that the widely used NSSE scales are, in fact, measuring practices that positively influence undergraduate cognitive and personal development across a broad range of outcomes. Good Practices in Undergraduate Education In the late 1980s and early 1990s, Chickering and Gamson (1987, 1991) synthesized much of the evidence on the impact of college on students and translated it into seven principles for good practice in undergraduate education. These seven principles are: a) student faculty contact; b) cooperation among students; c) active learning; d) prompt feedback to students; e) time on task; f) high academic expectations; and g) respect for diverse students and diverse ways of knowing. From an empirical standpoint, these seven dimensions of good practice stand solid. Even in the presence of controls for salient confounding influences, various measures of individual good practice dimensions have been found to be significantly and positively linked to desired aspects of cognitive and noncognitive growth during college and to career and personal benefits after college (Astin, 1993; Chickering & Reisser, 1993; Kuh, Schuh, Whitt, & Associates, 1991; Kuh, Kinzie, Schuh, Whitt, & Associates, 2005; Pascarella & Terenzini, 1991, 2005; Seifert, Goodman, Lindsay, Jorgensen, Wolniak, Pascarella, & Blaich, 2008). Examples of individual studies supporting the predictive validity of specific good practices in undergraduate education would include the following: a) student-faculty interaction (Anaya, 1999; Avalos, 1996; Kuh & Hu, 2001; Terenzini, Springer, Yaeger, Pascarella, & Nora, 1994); b) cooperation among students/cooperative learning (Cabrera, Crissman, Bernal, Nora, Terenzini, & Pascarella, 2002; Johnson, Johnson, & Smith, 1998a, 1998b; Qin, Johnson, & Johnson, 1995); c) active

NSSE Validation 5 learning (Hake, 1998; Kuh, Pace, & Vesper, 1997; Lang, 1996; Murry & Lang, 1997); d) prompt feedback to students (d Apollonia & Abrami, 1997; Feldman, 1997); e) academic effort/time on task (Astin, 1993; Hagedorn, Siadat, Nora, & Pascarella, 1997; Johnstone, Ashbaugh, & Warfield, 2002); f) high academic expectations (Arnold, Kuh, Vesper, & Schuh, 1993; Astin, 1993; Cruce, Wolniak, Seifert, & Pascarella, 2006; Whitmire & Lawrence, 1996); and g) diversity experiences (Gurin, Dey, Hurtado, & Gurin, 2002; Pascarella, Palmer, Moye, & Pierson, 2001; Terenzini, et al., 1994; Umbach & Kuh, 2006). Measuring Vetted Good Practices with the National Survey of Student Engagement The National Survey of Student Engagement (or NSSE) is a 15-20 minute self-report instrument specifically designed to measure the various good practices described above (Kuh, 2001). At the heart of NSSE are a series of items that ask students to indicate how much they are involved in specific academic and nonacademic activities and programs, the nature of their interactions with faculty and other students, the degree of intellectual challenge in their academic work, their involvement in diversity-related experiences, their perceptions of the supportiveness of the campus environment, and the like. Based on these student reports, NSSE provides participating colleges and universities with a comprehensive assessment of the extent to which their students are engaged in activities, or exposed to practices shown by the existing body of research to enhance the impact of an undergraduate education. Over time, NSSE has developed various scales or indexes underlying the individual items in the survey instrument. The most prominent and frequently reported are the five NSSE Benchmarks of Effective Educational Practice (hereafter referred to as benchmarks). These are: Level of Academic Challenge, Active and Collaborative Learning, Student-Faculty Interaction, Enriching Educational Experiences, and Supportive Campus Environment (NSSE, 2006). Level

NSSE Validation 6 of Academic Challenge is an eleven-item scale that measures time spent preparing for class, amount of reading and writing, deep learning, and institutional expectations for academic performance. Active and Collaborative Learning is a seven-item scale that measures extent of class participation, working collaboratively with other students inside and outside of class, tutoring and involvement with a community-based project. The Student-Faculty Interaction scale consists of six items and measures extent of interaction with faculty members and advisors, discussing ideas from classes with faculty members outside of class, getting prompt feedback on academic performance, and working with faculty on a research project. Enriching Educational Experiences is a scale with twelve items that measures extent of interaction with students of different racial or ethnic backgrounds or with different values or political opinions, using information technology, and participating in activities such as internships, community service, study abroad, and co-curricular activities. Finally, Supportive Campus Environment is a six-item scale measuring the extent to which students perceive the campus helps them succeed academically and socially, assists them in coping with nonacademic responsibilities, and promotes supportive relations among students and their peers, faculty members, and administrative personnel and offices. [The above descriptions of the five benchmark scales were taken from the College Student Report, 2006 Codebook developed by NSSE (2006).] Table 1 provides the specific items constituting each of the NSSE benchmark scales and the scale reliabilities from the present study s sample. Table 1 about here Though more recent and less prominent than the benchmarks, three additional scales developed by NSSE seek to measure deep approaches to learning (hereafter the deep learning scales) (Nelson Laird, Shoup, & Kuh, 2006; Nelson Laird, Shoup, Kuh, & Schwartz, 2008). The

NSSE Validation 7 three scales, some of which use items from the benchmarks are: Higher-Order Learning, Integrative Learning, and Reflective Learning. According to Nelson Laird et al. (2008), the fouritem Higher-Order Learning Scale focuses on the amount students believe that their courses emphasize advanced thinking skills such as analyzing the basic elements of an idea, experience, or theory and synthesizing ideas, information, or experiences into new, more complex interpretations (p. 477). The Integrative Learning Scale consists of five items and measures the amount students participate in activities that require integrating ideas from various sources, including diverse perspectives in their academic work, and discussing ideas with others outside of class (p.477). Finally, the three-item Reflective Learning Scale asks how often students examined the strengths and weaknesses of their own views and learned something that changed their understanding (p. 477). Nelson Laird and his colleagues have also used the items from the three scales to create an overall deep learning scale. The deep learning items also yield a total score which combines all 12 items. We present the specific items constituting each of the three deep learning subscales and the overall scale along with the scale reliabilities from the present sample in Table 2. Table 2 about here Despite its relatively broad-based national use, it seems reasonable to ask if good practices in undergraduate education as measured by the rather brief NSSE instrument actually do predict desired educational outcomes. With some narrowly focused exceptions (Carini, Kuh, & Klein, 2006; LaNasa, Olson, & Alleman, 2007), however, nearly all the predictive validity evidence in this regard is based on studies that link the various NSSE measures of good practices to student self-reported gains in intellectual and personal development that are assessed by a set of 16 items near the end of the NSSE instrument itself (e.g., Hayek, Carini, O Day, & Kuh,

NSSE Validation 8 2002; Kuh & Gonyea, 2003; Pike, 2006; Pike & Kuh, 2005; Pike, Kuh, & Gonyea, 2007; Zhao & Kuh, 2004; Nelson Laird, et al, 2008; Umbach & Kuh, 2003; Umbach & Wawrzynski, 2004). Although such self-reported gains can be formed into psychometrically reliable scales, there are serious problems with the internal validity of any findings in which self-reported gains are employed as an outcome or criterion measure (Pascarella, 2001). The key problem with using the NSSE self-reported gains as a criterion is that they are assessed cross-sectionally as one completes the NSSE instrument during college. Consequently, no precollege assessment of selfreported gains is available to take into account a student s receptiveness to educational experiences as reflected by his or her response propensity on these types of items. Two students having the same educational experience could report substantially different gains because they enter college differentially open or receptive to the effects of postsecondary education. Absent a precollege measure of the students response propensities on self-reported gains items (e.g., selfreported gains during high school), it is nearly impossible to take this differential receptiveness to educational experiences into account. Thus, using the NSSE self-reported gains in college as a validity measure for good practices runs a high risk of confounding the effects of exposure to good practices with the particular individual characteristics of the students an institution attracts and admits (Astin & Lee, 2003; Pascarella, 2001). The result is that, at present, we may have only a very limited body of internally valid evidence with respect to the actual predictive validity of the NSSE. This is a serious issue if participating institutions are asked to view the NSSE scales as a proxy for practices in undergraduate education that facilitate student growth in broadbased educational outcomes. Our study addressed the relative paucity of predictive validity for the NSSE by estimating the extent to which the NSSE benchmark and deep learning scales

NSSE Validation 9 predict net change during the first year of college across six dimensions of objectively measured liberal arts outcomes. Methods Samples Institutional Sample. The sample in the study consisted of incoming first-year students at 19 four-year and two-year colleges and universities located in 11 different states from 4 general regions of the United States: Northeast, Southeast, Midwest, and Pacific Coast. Institutions were selected from more than 60 colleges and universities responding to a national invitation to participate in the Wabash National Study of Liberal Arts Education (WNSLAE). Funded by the Center of Inquiry in the Liberal Arts at Wabash College, the WNSLAE is a large, longitudinal investigation of the effects of liberal arts colleges and liberal arts experiences on the cognitive and personal outcomes theoretically associated with a liberal arts education. The institutions were selected to represent differences in college and universities nationwide on a variety of characteristics including institutional type and control, size, location, and patterns of student residence. However, because the study was primarily concerned with the impacts of liberal arts colleges and liberal arts experiences, liberal arts colleges were purposefully over-represented. Our selection technique produced a sample with a wide range of academic selectivity, from some of the most selective institutions in the country to some that were essentially open admissions. There was also substantial variability in undergraduate enrollment, from institutions with entering classes between 3,000 and 6,000, to institutions with entering classes between 250 and 500. According to the 2007 Carnegie Classification of Institutions, 3 of the participating institutions were considered research universities, 3 were regional universities that did not grant the doctorate, 2 were two-year community colleges, and 11 were liberal arts colleges.

NSSE Validation 10 Student Sample. The individuals in the sample were first-year, full-time undergraduate students participating in the WNSLAE at each of the 19 institutions in the study. The initial sample was selected in either of two ways. First, for larger institutions, it was selected randomly from the incoming first-year class at each institution. The only exception to this was at the largest participating institution in the study, where the sample was selected randomly from the incoming class in the College of Arts and Sciences. Second, for a number of the smallest institutions in the study all liberal arts colleges the sample was the entire incoming first-year class. The students in the sample were invited to participate in a national longitudinal study examining how a college education affects students, with the goal of improving the undergraduate experience. They were informed that they would receive a monetary stipend for their participation in each data collection, and were also assured in writing that any information they provided would be kept in the strictest confidence and never become part of their institutional records. Data Collection Initial Data Collection. The initial data collection was conducted in the early fall of 2006 with 4,501 students from the 19 institutions. This first data collection lasted between 90-100 minutes and students were paid a stipend of $50 each for their participation. The data collected included a WNSLAE precollege survey that sought information on student demographic characteristics, family background, high school experiences, political orientation, educational degree plans, and the like. Students also completed a series of instruments that measured dimensions of cognitive and personal development theoretically associated with a liberal arts education. These are described in greater detail in the subsequent section on WNSLAE Outcomes/Dependent Measures.

NSSE Validation 11 Follow-up Data Collection. The follow-up data collection was conducted in spring 2007. This data collection took about two hours and participating students were paid an additional stipend of $50 each. Two types of data were collected. The first was based on questionnaire instruments that collected extensive information on students experience of college. Two complementary instruments were used: the National Survey of Student Engagement (NSSE), previously described, and the WNSLAE Student Experiences Survey (WSES). However, for the purposes of this study, we focus on information provided by the NSSE. The second type of data collected consisted of follow-up (or posttest) measures of the instruments measuring dimensions of cognitive and personal development that were first completed in the initial data collection. All students completed the NSSE and WSES prior to completing the follow-up instruments assessing cognitive and personal development. Both the initial and follow-up data collections were administered and conducted by ACT (formerly the American College Testing Program). Of the original sample of 4,501 students who participated in the fall 2006 testing, 3,081 participated in the spring 2007 follow-up data collection, for a response rate of 68.5%. These 3,081 students represented 16.2% of the total population of incoming first-year students at the 19 participating institutions. To provide at least some adjustment for potential response bias by sex, race, academic ability, and institution in the sample of students, a weighting algorithm was developed. Using information provided by each institution on sex, race, and ACT score (or appropriate SAT equivalent or COMPASS score equivalent for community college students), follow-up participants were weighted up to each institution s first-year undergraduate population by sex (male or female), race (Caucasian, African American/Black, Hispanic/Latino, Asian/Pacific Islander, or other), and ACT (or equivalent score) quartile. While applying weights

NSSE Validation 12 in this manner has the effect of making the overall sample more similar to the population from which it was drawn, it cannot adjust for nonresponse bias. Conceptual Framework for Liberal Arts Outcomes Since the WNSLAE was fundamentally concerned with understanding the conditions and experiences that constituted an influential liberal arts education, its first task was to conceptually define the desired cognitive and personal outcomes of such an education. Synthesizing much of the literature on liberal arts education, and building on the work of Jones and McEwen (2000), King, Kendall Brown, Lindsay, and VanHecke (2007) developed a comprehensive model of liberal arts outcomes that embraced seven general dimensions: effective reasoning and problem solving, well-being, inclination to inquire and lifelong learning, intercultural effectiveness, leadership, moral character, and integration of learning. Although such outcome dimensions appear central to the undergraduate mission of a large cross-section of American colleges and universities (see, for example, the outcome taxonomy employed by Pascarella & Terenzini, 1991, 2005, in organizing college impact outcomes), the distinctiveness of the liberal arts outcomes lies in the integrated connections that exist between outcomes and their holistic nature, which spans cognitive, interpersonal, and intrapersonal domains. Consequently, the WNSLAE was largely guided by this conceptual framework of liberal arts outcomes in selecting specific outcome measures. Indeed, with the single exception of integration of learning, the WNSLAE study was able to identify specific outcome or dependent measures representing six of the seven liberal arts outcomes specified by the King et al. conceptual model. WNSLAE Outcome/Dependent Measures Effective Reasoning and Problem Solving. To tap this outcome, we used the critical thinking module from the Collegiate Assessment of Academic Proficiency (CAAP) developed

NSSE Validation 13 by the American College Testing Program (ACT). The critical thinking test is a 40-minute, 32- item instrument designed to measure a student s ability to clarify, analyze, evaluate, and extend arguments. The test consists of four passages in a variety of formats (e.g., case studies, debates, dialogues, experimental results, statistical arguments, editorials). Each passage contains a series of arguments that support a general conclusion and a set of multiple-choice test items. The internal consistency reliabilities for the CAAP critical thinking test range between.81 and.82 (ACT, 1991). It correlates.75 with the Watson-Glaser Critical Thinking Appraisal (Pascarella, Bohr, Nora, & Terenzini, 1995). Well Being. We operationalized this dimension of liberal arts outcomes with several individual measures. The first was the Ryff Scales of Psychological Well-Being (SPWB) (Ryff, 1989; Ryff & Keys, 1995). The SPWB is a 54-item, theoretically-grounded instrument that specifically focuses on measuring six dimensions of psychological well-being: positive evaluations of oneself (Self-Acceptance), sense of continued growth and development as a person (Personal Growth), belief in a purposeful and meaningful life (Purpose in Life), quality relations with others (Positive Relations with Others), capacity to effectively manage one s life and surrounding world (Environmental Mastery), and sense of self-determination (Autonomy) (Ryff & Keyes, 1995; Ryff, 1989; Keyes, Shmotkin, & Ryff, 2002). Due to recent concerns about the construct validity and interpretation of the six subscales (Springer & Hauser, 2006; Springer, Hauser, & Freese, 2006), we used the total score in this study. The total score for the SPWB had a reliability of.88. The SPWB tends to have significant, positive associations with frequently used measures of happiness and satisfaction and negative associations with depression (Ryff & Keyes, 1995).

NSSE Validation 14 Inclination to Inquire and Lifelong Learning. This outcome was operationally represented with two scales. The primary measure was the 18-item Need for Cognition Scale (NCS). Need for cognition refers to an individual s tendency to engage in and enjoy effortful cognitive activity (Cacioppo, Petty, Feinstein, & Jarvis, 1996, p. 197). Those who have a high need for cognition tend to seek, acquire, think about, reflect back on information to make sense of stimuli, relationships, and events in their world (p. 198). In contrast, those with low need for cognition are more likely to rely on others, such as celebrities and experts, cognitive heuristics, or social comparison processes to provide or make sense of their world. The reliability of the NCS ranges from.83 to.91 in samples of undergraduate students (Cacioppo et al., 1996). With samples of undergraduates, the NCS has been positively associated with the tendency to generate complex attributions for human behavior, high levels of verbal ability, engagement in evaluative responding, one s desire to maximize information gained rather than maintain one s perceived reality (Cacioppo et al., 1996) and college grades (Elias & Loomis, 2002). The NCS is negatively linked with authoritarianism, need for closure, personal need for structure, the tendency to respond to information reception tasks with anxiety, and chronic concern regarding self-presentation (Cacioppo et al., 1996). The second measure designed to tap continuing motivation for lifelong learning was a 6- item measure entitled the Positive Attitude Toward Literacy Scale (PATL). The PATL assesses students enjoyment of such literacy activities as reading poetry and literature, reading scientific and historical material, and expressing ideas in writing, and has an internal consistency reliability of.71. The PATL score at entrance to college correlated.36 with three-year cumulative scores during college on a measure of library use,.48 with the cumulative number of unassigned books

NSSE Validation 15 read during three years of college, and.26 with a measure of reading comprehension administered after three years of college (Bray, Pascarella, & Pierson, 2004). Intercultural Effectiveness. This outcome dimension was measured with two scales. The primary measure was the 15-item, short form of the Miville-Guzman Universality-Diversity Scale (M-GUDS). The M-GUDS measures an individual s universal-diverse orientation, which is defined as an attitude of awareness and acceptance of both similarities and differences that exist among people (Miville, Gelso, Pannu, Liu, Touradji, Holloway, & Fuertes 1999; Fuertes, Miville, Mohr, Sedlacek, & Gretchen, 2000). The instrument has a total scale score and three subscale scores: Diversity of Contact (interest and commitment to participating in diverse, intentionally focused social and cultural activities), Relativistic Appreciation (appreciation of both similarities and differences in people and the impact of these in one s self-understanding and personal growth), and Comfort with Differences (the degree of comfort with diverse individuals). The internal consistency reliability for the total M-GUDS score in the present study was.85, while reliabilities for the three subscales ranged from.77 to.78. The precollege total M- GUDS score correlated.47 with a measure of students experiences and interactions with diverse others and diverse ideas during the first year of college. In the present study, we used the M- GUDS total score. The second instrument used to assess student growth in intercultural effectiveness was the seven-item Openness to Diversity/Challenge (ODC) scale. This scale measures one s openness to cultural and racial diversity as well as the extent to which one enjoys being challenged by different perspectives, values, and ideas (Pascarella, Edison, Nora, Hagedorn, & Terenzini, 1996). The ODC has internal consistence reliabilities in the present study ranging from.83 to.87, depending on the data collection wave. In previous research, precollege ODC

NSSE Validation 16 scores have significantly predicted the likelihood of participating in a racial/cultural workshop during the first year of college (Whitt, Edison, Pascarella, Terenzini, & Nora, 2001). In the present study, precollege ODC scores correlated.37 with a measure of students experiences and interactions with diverse others and diverse ideas during the first year of college. Leadership. This outcome dimension was assessed with the 68-item, revised version II of the Socially Responsible Leadership Scale (SRLS). The SRLS measures the eight dimensions of Astin s Social Change Model of leadership development (Astin, A., Astin, H., Boatsman, Bonous-Hammarth, Chambers, Goldberg, et al., 1996). According to this model, leadership is a collaborative group process directed toward promoting positive social change in an organization or community (Tyree, 1998). A person who demonstrates strong socially responsible leadership capabilities is self-aware, acts in accordance with personal values and beliefs, invests time and energy in activities that he or she believes are important, works with diverse others to accomplish common goals, has a sense of civic and social responsibility, and desires to make the world a better place. The SRLS was developed specifically to measure leadership in college students. The instrument has eight scales corresponding to the eight dimensions of leadership specified in the Astin model (Astin et al., 1996; Dugan, 2006). The eight scales are: a) Consciousness of Self - being aware of the values, emotions, attitudes, and beliefs that motivate one to take action; b) Congruence - thinking, feeling, and behaving with consistency, genuineness, authenticity, and honesty toward others; c) Commitment - intensity and duration in relation to a person, idea, or activity the energy and passion that propels one to act; d) Collaboration - working with others in a common effort; e) Common Purpose - working with others within a shared set of aims and values; f) Controversy with Civility - recognizing two fundamental realities of any group effort, that 1) differences of viewpoint are inevitable and valuable, and 2) such differences must be

NSSE Validation 17 aired openly and with respect and courtesy; g) Citizenship - believing in a process whereby a person or group is responsibly connected to the environment and the community; and h) Change adapting to continuously evolving environments and situations, while maintaining the primary functions of the group. The internal consistency reliabilities for the eight subscales of the SRLS in the present study ranged from.77 to.88. Previous research has shown the various scales of the SRLS discriminate between involved and uninvolved undergraduate students in community service, student organizational membership, formal leadership programs, and positional leadership roles (Dugan, 2006). Additional research by Rubin (2000) has demonstrated that undergraduates identified as emerging student leaders tend to score significantly higher on the SRLS congruency, collaboration, common purpose, citizenship, and change scales than a control group of students not identified as emerging student leaders. Because the SRLS does not have a total score, we used the eight SRLS subscales. Moral Character. We assessed the outcome dimension of Moral Character with the Defining Issues Test 2 (DIT2). The DIT2 is a revised version of James Rest s original DIT from 1979 that measures one component of moral development, known as moral judgment or reasoning (Rest, Narvaez, Thoma, & Bebeau, 1999). The DIT2 presents several dilemmas about social problems, such as should a starving man steal food for his family from someone who is hoarding resources? After each, a series of 12 items representing different issues that might be raised by the problem are presented. For example, in the scenario described above, the items include such questions as: Would stealing bring about more total good for everybody concerned or wouldn t it? Shouldn t the community s laws be upheld? In response to the scenario and questions, respondents are asked to do three things:

NSSE Validation 18 1. make an action choice (i.e., yes, one should steal or no, one should not steal); 2. rate the series of 12 items in terms of their importance in making a decision about the scenario; and 3. rank the top four most important items. The DIT2 produces two relevant scores. The first is the P-score, which represents the degree to which an individual uses higher order (principled/post-conventional) moral reasoning in resolving moral issues presented in each scenario. The P-score is the proportion of items selected that appeal to moral ideas and/or theoretical frameworks for resolving complex moral issues specifically, items that appeal to consensus building procedures, insistence on due process, safeguarding minimal basic rights, and organizing social arrangements in terms of appealing to ideals. The P-score has internal consistency reliabilities ranging from.74 to.77 (Rest et al., 1999; University of Minnesota, n.d.). The second score from the DIT2 particularly salient to the WNSLAE is the N2-score. As with the P-score, the N2-score reflects the extent to which one exhibits high order moral reasoning. However, the relatively new N2-score also reflects the extent to which one rejects ideas because they are simplistic or biased (Bebeau & Thoma, 2003). The internal consistency reliability for the N2-score range from.77 to.81 (Rest, et al., 1999; University of Minnesota, n.d.). An extensive body of evidence supports the validity of the DIT in predicting principled ethical behavior in a number of areas. These include: resistance to cheating, peer pressure, and unlawful or oppressive authority; whistle-blowing on corruption; the keeping of contractual promises; helping behavior; community involvement; ethical behavior in several professions; clinical performance in nursing students; and social/political activism (see Pascarella & Terenzini, 1991, 2005, for a synthesis of this body of evidence, including citations to original

NSSE Validation 19 studies). The vast majority of the validity data on the DIT is based on the P-score. However, correlations between the P-score and the N2-score ranged from.91 to.92, suggesting they measure essentially the same construct. For the purposes of this study, we chose to analyze the more comprehensive and recently developed N2-score. With two exceptions, all 3,081 participants completed each WNSLAE outcome/dependent measure during both the initial data collection in fall 2006 and the follow-up data collection in spring 2007. The two exceptions were the CAAP Critical Thinking Test and the Defining Issues Test. Because each of these instruments took at least 40 minutes to complete, we were concerned with the amount of time required of students during each data collection. To remedy this concern, we randomly divided the study participants during the first data collection into two approximately equal samples. One random sample then took the CAAP Critical Thinking Test during both waves of data collections and the other random sample took the Defining Issues Test taken during both data collection waves. Of the 3,081 students participating in both data collections, 1,485 had useable responses on the CAAP and 1,584 had useable responses on the DIT. NSSE Measures of Good Practices in Undergraduate Education The independent variables in the study were four of the five NSSE benchmark scales of effective educational practice (Level of Academic Challenge, Active and Collaborative Learning, Student-Faculty Interaction, Supportive Campus Environment) and the three deep approaches to learning scales (Higher-Order Learning, Integrative Learning, Reflective Learning) previously described. The Enriching Educational Experiences scale was not used in individual-level analysis because it had a particularly low internal consistency reliability in the WNSLAE sample (alpha =

NSSE Validation 20.44). However, we did use the Enriching Educational Experiences scale in institution-level analyses where scale reliability is less of an issue. Control Variables A particular methodological strength of the Wabash National Study of Liberal Arts Education is that it is longitudinal in nature. This permitted us to introduce a wide range of statistical controls, not only for student background and precollege traits and experiences, but also for other experiences during the first year of college. Our control variables in the present study included the following: A parallel precollege measure for each liberal arts outcome measure. Tested precollege academic preparation. This was the student s ACT score, SAT equivalent score, or COMPASS equivalent score for community college students. Sex. Race (coded as 1 = White, 0 = non-white). Average parental education. This was computed as the average of the respondent s parents education provided that the student provided a response for at least one parent. The item asked What is the highest level of education each of your parents/guardians completed? The response options are: 1 = did not finish high school, 2 = High school graduate/ged, 3 = Attended college but no degree, 4 = Vocational/technical certificate or diploma, 5 = Associate of other 2-year degree, 6 = Bachelors or other 4-year degree, 7 = Masters, 8 = Law, 9 = Doctorate). High school involvement. This was a seven-item scale with an internal consistency reliability of.58 that measured involvement during high school. Examples of constituent items include: During your last year in high school, how often did you

NSSE Validation 21 study with a friend? During your last year in high school, how often did you talk with teachers outside of class? During your last year in high school, how often did you participate in extracurricular activities? Response options were very often, often, occasionally, rarely, or never. Scores on the scale were obtained during the initial data collection in fall 2006. Precollege academic motivation. This was an eight-item, likert-type scale in which respondents were asked to indicate the extent to which they agree or disagree ( strongly agree, agree, not sure, disagree, strongly disagree ) with statements about their academic motivation. These statements included: a willingness to work hard to learn material even if it doesn t lead to a higher grade, the importance of getting good grades, reading more for a class than required, enjoyment of academic challenge, and the importance of academic experiences in college. The internal consistency reliability for the scale is.69, and scores on the scale were obtained during the initial data collection in fall 2006. Hours per week during the first year of college one worked both on and off campus. There were eight response options from zero to more than 30 hours. Lived in campus housing (coded 1) versus elsewhere (coded 0) during the first year of college. The liberal arts emphasis of one s first year coursework. [Operationalized as the total number of courses during the first year of college taken in traditional liberal arts areas: Fine Arts, Humanities, and Languages (e.g., art, music, philosophy, religion, history); Mathematics/Statistics/Computer Science ; Natural Sciences (e.g.,

NSSE Validation 22 chemistry, physics); and Social Science (e.g., anthropology, economics, psychology, political science, sociology)] Institutional type. This was operationally defined as three dummy variables representing attendance at a research university, regional university, or community college (each coded 1), with attendance at a liberal arts college always coded 0. Information on work responsibilities, place of residence, and first-year coursework was obtained during the follow-up data collection in spring 2007. Analyses Because of the limited number of institutions in the WNSLAE (N = 19), we considered hierarchical linear modeling (HLM) to be inappropriate for our data (although with sufficient numbers of institutions in the sample it would have been the analytical method of choice). Consequently, we conducted separate analyses with institutions and individuals as the unit of analysis. Since some of the items in the NSSE benchmark scales are also used in the deep learning scales, we conducted analyses for the benchmark scales and the deep learning scales separately. At the institutional level (N = 19), we first computed simple correlations between the average NSSE benchmark scales and the average deep learning scales at each institution and each liberal arts outcome. We then computed partial correlations between each of the average institutional-level benchmark and deep learning scale scores and the institutional average score on each outcome, controlling for the institutional average precollege score on an outcome. The results of these analyses provided what might be considered, respectively, as an upper- and lower-bounds estimate of the effects of the NSSE benchmark and deep learning scales on the liberal arts outcomes at the institutional level. In addition, we control for the pretest in the lower-bounds estimates, this is essentially the same as estimating the effects of the various

NSSE Validation 23 NSSE scales on the net change in each outcome measure during the first year of college (Pascarella, Wolniak, & Pierson, 2003). Because of the extremely small sample size in institutional-level analysis, the critical alpha level was set at.10. A similar set of analyses was conducted with individual students as the unit of analysis (N = 2,861 for all outcomes except critical thinking: N = 1,426, and moral reasoning: N = 1,446). To get an upper-bounds estimate, we simply regressed each liberal arts outcome on the NSSE benchmark scales (excluding the Enriching Education Experience scale) and the deep learning scales without controls. To get a lower-bounds estimate, we repeated the analyses introducing controls for: a parallel pretest, tested precollege academic preparation, sex, race, parental education, measures of high school involvement and precollege academic motivation, institutional type (research university, regional institution, or community college with liberal arts college as the reference group), place of residence and work responsibilities during the first year of college, and the liberal arts emphasis of one s coursework at college. In all individual-level analyses, we also adjusted for the clustered or nested nature of our data. The clustered nature of our data results from the fact that the individuals in our sample were not drawn from a random individual sample but a sample in which their postsecondary institution was the primary sampling unit. Because students within a school are more similar than across schools, the error terms from the prediction model are correlated, which violates one of the assumptions of Ordinary Least Squares regression and results in underestimated standard errors (Ethington, 1997; Raudenbush & Bryk, 2001). We accounted for the nested nature of the data by using appropriate statistical techniques that adjust for this clustering (Groves et al., 2004).

NSSE Validation 24 Results Institutional-Level We summarize the upper- and lower-bounds estimates of the associations between the institutional average NSSE benchmark scale scores and the institutional average liberal arts outcome scores in Table 3. The upper-bounds estimate is the zero-order correlation (column r in Table 3), while the lower-bounds estimate is the partial correlation, controlling for the institutional average precollege score on each liberal arts outcome (column pr in Table 3). As the table shows, at least one NSSE benchmark had a significant, positive zero-order correlation with each of the 15 liberal arts outcome scores. Not surprisingly, when a control was introduced for the respective average precollege liberal arts outcome score, both the magnitude of the resultant partial correlations (column pr ) and their statistical significance tended to be reduced across most, though not all, outcomes. Net of the influence of the precollege score, at least one NSSE benchmark had a significant partial correlation with all but five of the first-year liberal arts outcomes: need for cognition, and the congruence, commitment, collaboration and citizenship scales of the socially responsible leadership measure. Table 3 about here Of all the NSSE benchmark scales, the supportive campus environment benchmark had the most significant partial correlations (six) with the liberal arts outcomes. These statistically significant partial correlations were concentrated in three areas: psychological well-being (pr =.725); intercultural effectiveness (pr =.483 and.433, respectively, with the Miville-Guzman universality-diversity scale and the openness to diversity scale); and socially responsible leadership (pr =.435,.478, and.666, respectively, with consciousness of self, common purpose, and change scales). Both the level of academic challenge and enriching educational experiences

NSSE Validation 25 benchmarks had significant partial correlations with four liberal arts outcomes. Each benchmark had a significant partial correlation with first-year critical thinking (pr =.433 for academic challenge and.441 for enriching experiences). Aside from their common association with critical thinking, however, academic challenge and enriching educational experiences had significant partial correlations with different liberal arts outcomes. Academic challenge had significant partial correlations with positive attitude toward literacy (pr =.512), consciousness of self (pr =.499), and controversy with civility (pr =.625), while enriching educational experiences had significant partial correlations with the Miville-Guzman universality-diversity scale (pr =.572), openness to diversity (pr =.410), and moral reasoning as measured by the DIT-N2 score (pr =.444). Both the active and collaborative learning and student-faculty interaction benchmarks had significant partial correlations with only one liberal arts outcome. Net of average institutionallevel precollege score, active and collaborative learning significantly predicted the openness to diversity score (pr =.559), while student-faculty interaction had a significant partial correlation (pr =.424) the change score of the socially responsible leadership measure. Table 4 summarizes the estimated upper- and lower-bounds estimates of the associations between the institutional average NSSE deep learning scale scores and the institutional average liberal outcomes scores. Again, the upper-bounds estimate is the zero-order correlation (column r ), while the lower-bounds estimate is the partial correlation, controlling for the institutional average precollege score on each liberal arts outcome (column pr ). Overall, with institutions as the unit of analysis, the NSSE deep learning scales did not predict the first-year liberal arts outcomes as well as did the NSSE benchmarks. The deep learning scales had significant, positive zero-order correlations with 11 of the 15 liberal arts outcome measures. However, when a control

NSSE Validation 26 for the respective average precollege liberal arts outcome score was introduced, the deep learning scales had statistically significant, and positive partial correlations with only four of the liberal arts outcome measures. Moreover, the significant, positive partial correlations appeared to be concentrated in three areas: inclination to inquire and lifelong learning; intercultural effectiveness, and socially responsible leadership. Significant, positive partial correlations were found between the need for cognition scale and the integrative learning (pr =.465), reflective learning (pr =.562), and the deep learning total score (pr =.502). Similarly, the deep learning total score (pr =.546), the integrative learning score (pr =.591), and the higher order learning score (pr =.435) each had a significant, positive partial association with the openness to diversity scale. Finally, all four deep learning scales had a significant, positive partial correlation with the controversy with civility scale of the socially responsive leadership measure: total score, pr =.689; higher order learning, pr =.660; integrative learning, pr =.549; and reflective learning, pr =.705. In contrast with the NSSE benchmarks, the NSSE deep learning scales had no significant relationship, net of precollege scores, with critical thinking, psychological well being, positive attitude toward literacy, the Miville-Guzman universality-diversity scale, or moral reasoning as measured by the DIT-N2 score. Furthermore, all four deep learning scales had a significant negative partial correlation with the collaboration core of the socially responsible leadership measure. Table 4 about here Individual-Level We present the upper- and lower-bounds estimates of the standardized (β) associations between four individual student-level NSSE benchmark scores and individual-level liberal arts

NSSE Validation 27 outcome scores in Table 5. (Recall we excluded the enriching educational experiences benchmark from the individual-level analyses because of a low reliability in the present sample.) The upper-bounds estimate (column U in Table 5) controls only for the other NSSE benchmark scales. The lower-bounds estimate (column L in Table 5) controls not only for the other benchmark scales, but also for a parallel precollege measure of the respective liberal arts outcome measure, tested precollege academic preparation, race, sex, parental education, precollege academic motivation, high school involvement, place of residence and work responsibilities during college, the liberal arts emphasis of first-year college coursework, institutional type. Both the upper- and lower-bounds estimates account for the clustering or nesting effect of the data. As the table shows, at least one of the four NSSE benchmark scales included in the individual-level analyses had a significant, positive, upper-bounds estimate standardized association with each of the 15 liberal arts outcome scores. As expected, when additional controls were introduced to obtain the lower-bounds estimates, a substantial number of these associations became smaller and nonsignificant (column L ). However, even with the substantially more stringent controls in the lower-bounds estimates, all 15 liberal arts outcome measures had a significant and positive, though modest, association with at least one of the four benchmark scales. Table 5 about here Students perceived levels of academic challenge had the widest range of estimated effects in the lower-bounds equations. Net of an extensive battery of controls, level of academic challenge had significant, positive standardized associations with psychological well-being (β =.050), both measures of intercultural effectiveness (the Miville-Guzman, β =.110, and the