Hoke Wilson, PhD Director, Center for Workforce Development and Social Policy Research Optimal Solutions Group
Session Objec,ves To motivate practitioners to embrace evaluation as a standard practice To describe four general categories of evaluation To use the first and second categories those we are most familiar with to inform our thoughts about the third and fourth To visualize the issues we will have to confront if we are to successfully conduct an impact, or even outcome, evaluation
Why Evaluate? Because we care about what we re doing and want to get it right Because, like it or not, the evaluation tsunami is coming.
Why Evaluate? This just in! Americans don t trust their government! Only 20% of Americans express confidence in their government (Pew, Oct. 2013) Almost half of all Americans do not trust Federal Agencies to carry out their missions (Gallop, Sept. 2013) Americans aren t thrilled about paying taxes/funding government (http:hoke s Totally Unsubstantiated Opinion.org)
Why Evaluate? The Government has something to sell: Itself! Endeavoring to be smarter, more innovative, and more accountable by applying existing evidence about what works, generating new knowledge, and using experimentation and innovation to test new approaches to program delivery 1 Spending decisions [will be] based not only on good intentions, but also on strong evidence that yield the highest social returns on carefully targeted investments 2 1 http://www.whitehouse.gov/sites/default/files/omb/memoranda/2013/m- 13-17.pdf 2 http://www.gpo.gov/fdsys/pkg/budget- 2014- PER/pdf/BUDGET- 2014- PER.pdf
Why Evaluate? Currently, some agencies (not FNS) are having trouble with adoption, but they re getting there Four basic types of evaluation Process Formative Outcome Impact
Process Evalua,on ( Bean Coun,ng ) Tracking the actual implementation, e.g. its delivery, who it reached, what resources were used? Used to determine if intervention was delivered as designed. Helps identify barriers to administration and strategies to overcome these barriers in the future. If you re a SNAP- Ed er, you already do this.
Educa,on and Administra,ve Repor,ng System (EARS) Provides data and information about the activities of all States participating in SNAP- Ed Data collected include: Counts of SNAP- Ed participants; their ages, gender, race and ethnicity The number and length of sessions attended Expenditures by source of funding Administrative $$ vs. Program Delivery $$ Educational content
GeFng Started Forma,ve Evalua,on Often used at the nascent stages of intervention planning. What s going on here? Often used when developing a questionnaire. Do my people understand what I m trying to ask them? Data collection may comprise the use of focus groups, literature reviews, ethnographic interviews, or even anecdotal evidence. Results are informative not definitive!
Evidence- Based Prac,ce 1 A 5- step program widely used in the health care professions for assessing evidence in a systematic way and choosing a course of action. A structured approach to formative evaluation!. 1 Evidence- Based Practice: An Interprofessional Tutorial, University of Minnesota Bio- Medical Library, http://www.biomed.lib.umn.edu/ learn/ebp/index.html
Evidence- Based Prac,ce (Step 1: Ques,on Formula,on) Problem What seems to be the matter? (Needs Assessment) Intervention How do you propose to set things right? Comparison What approaches have others taken? Outcome(s) What would you like to see happen?
Evidence- Based Prac,ce (Step 2: Literature Review)
Evidence- Based Prac,ce (Step 3: Cri,cal Appraisal) A. Was the population under study appropriate (sample frame!) B. Assignment to treatments/conditions randomized? Intention- to- treat: No disappearing subjects. Why is this important? Were the treatment groups similar at the beginning of the trial?
C. Evidence- Based Prac,ce (Step 3: Cri,cal Appraisal) Were the outcomes measured identically for all groups. Treatments too, if more than one dosage level! Were treatment and comparison groups, at the end of the trial, similar in all ways except the outcome of interest? Was the follow- up accurate?
Evidence- Based Prac,ce (Step 4: Applying the Evidence) Steps 1 through 3 address internal validity. Are the studies you ve reviewed valid for the populations and situations the authors had to deal with? Here we need to ask if the studies have external validity from your perspective! Can they be pulled out of their specifics and applied to other, similar situations without a complete overhaul?
Evidence- Based Prac,ce (Step 5: Re- evalua,on!) After a careful critique of all the work that has come before you After the careful and methodical application of your chosen strategy to resolving the problem confronting you Did it work?? There is only ONE way to find out!!!
Evidence- Based Prac,ce We have used the best possible evidence to formulate our approach, but precisely what is evidence? Evidence Proof!
Outcome Evalua,on Addresses whether changes in your outcome variable occurred in conjunction with your intervention Indicates the degree of change but cannot ascribe the source of the change Something did or did not happen. Why it happened we cannot tell.
Outcome Evalua,on (How high is up?) Outcome Comparison Intervention Pre Test Post Test
Outcome Evalua,on (How low is down?) Outcome Intervention Comparison Pre Test Post Test
Outcome Evalua,on (PiTalls clustered samples) Pre/Post Outcome Difference Class 1 Class 2 Class 3 Dosage Level
Outcome Evalua,on (PiTalls not considering varia,on in response) # of Units Pre Post Outcome
Outcome Evalua,on (How do I know how big the varia,on might be?) You won t know until you re done, but Let experience be your guide. What are the highest and lowest values of the outcome variable that you have somewhat commonly seen? Will the change you expect to bring about fall within this range/interval? Let your evidence- based formative evaluation be your guide. Most research papers will provide a standard deviation (sd) or standard error (se) somewhere, in some form. Take your expected baseline value and add two sds. Does your expected final value fall within this range?
Outcome Evalua,on (So what does this mean?) All other things being equal, the bigger the change you expect to bring about, the fewer people/surveys you will need to demonstrate an effect! All other things being equal, the wider the variation in potential outcomes, the more people/surveys you will need to demonstrate an effect! In all events, with an outcome evaluation we will only be able to tell if there was a change. We will never be able to be certain of what motivated it.
Outcome Evalua,on (PiTalls Spurious correla,on) My favorite example is to do the following:* 1. Get data on all the fires in San Francisco for the last ten years. 2. Correlate the number of fire engines at each fire and the damages in dollars at each fire. Conclusion: The more fire engines you have the greater will be the likely dollar value of the damage. Therefore, get rid of the fire engines and you ll have less damage. * From William C. Burns, Spurious Correlations (http://www.burns.com/wcbspurcorl.htm)
Outcome Evalua,on (PiTalls Spurious correla,on
Impact Evalua,on When properly conducted an impact evaluation can deal with a number of issues ignored with outcome evaluation. Maturation effects and seasonality. Potential influence of confounds/population attributes through randomization Reasonably minor differences in treatment groups through differencing.
Impact Evalua,on (Seasonality/matura,on and group differences) F&V Consumption Comparison No real difference Intervention Winter Summer
Impact Evalua,on (Confounds not to be confused with stra,fica,on) Treatment Comparison Row Total Second Grade 5 95 100 Eight Grade 95 5 100 Column Total 100 100 200 How can we distinguish the effects of the treatment from the effects of being an adolescent?
Impact Evalua,on (Confounding confounds through random assignment) Treatment Comparison Row Total Second Grade 45 55 100 Eight Grade 55 45 100 Column Total 100 100 200 Groups are not perfectly comparable by grade, but this is a situation we can deal with!
Impact Evalua,on (Randomiza,on within matched pairs) Height Town 1 Town 2 Town 3 Do you see a couple of Tylenol moments on the horizon? Age
Impact Evalua,on (Quasi- experimental Design) As we saw in the last slide, sometimes random assignment is just not practical (or even ethical!). In such cases we must make do with naturally occurring experiments. In the case of the last slide, what should we do? Should we drop one town, or go find some more? Adding more, what will happen to the number of students? Up or down? Sorry. We ve simply traded one set of problems for another. No easy way out.
Universal Considera,ons Appropriate Sample Frame Our sample frame places a border around everything we do. We sample properly so that we can make valid inferences from our sample to the population. But the population is defined by the sample frame!!! A good sample frame must Be realistically obtainable and accurate ( Are these telephone numbers still likely to be valid? What s the issue if some sizable percentage is not?) Fully describe your population of interest without leaving anyone (or very few) out
Universal Considera,ons Appropriate Sample Frame
Universal Considera,ons Appropriate Sample Frame* *The only population of women about whom I can make reasonably valid inferences
Universal Considera,ons Response rates and comple,on rates High response rates needed to avoid undermining your carefully crafted sampling frame by introducing response bias Requires persistence and will not make you popular. Get used to it! 80% is the OMB standard The survey data collector as the 21 st century s Fuller Brush Man/Woman
Universal Considera,ons Response rates and comple,on rates A completion often means obtaining responses to a minimum number of key questionnaire item Here we take it to mean the completion of both pre- and post- intervention surveys for a given respondent. From your experience, might there be a systematic difference between individuals who choose to stick with a program and those who do not? Which group is likely to exhibit the best outcomes? Which group is likely to have the most favorable view of your intervention?
Universal Considera,ons Response rates and comple,on rates Intention- to- treat: Now this is a tough, tough data collection. Why is it tough? Why must we bother? How do we incorporate this partial data with our completed (pre/post) records? Hint: think dosage.
Jus,fying your existence with Four types of evaluation Process Formative (Evidenced- based? Don t forget step five!) Outcome Impact Each type of evaluation progressively brings greater Rigor Inconvenience Expense So why do it?
Survey Research Because it pays! Nothing looks better on a grant application than evidence. Pay for Success The Administration is continuing to invest in Pay for Success to support evidence- based innovation at the State and local levels. In the Pay for Success model, philanthropic and other private investors provide up- front funding for preventive services and the government does not pay unless and until there are results 1 1 http://www.gpo.gov/fdsys/pkg/budget- 2015- PER/pdf/BUDGET- 2015- PER.pdf p.68