After exploring the ‘evaluation deficit’ that exists in schools in the first half of this two-part blog series, this second half looks at how we can improve current evaluation practices.
ImpactEd’s Owen Carter explores how we might be able to better understand what works in supporting young people to develop wider skills and competencies. He considers how we can be more rigorous and transparent in how we evaluate the impact of interventions on outcomes beyond academic exams.
In Part One of this blog, I outlined why evaluating non-academic outcomes can be so challenging, why it is worth doing so, and some of the issues in relation to measurement. Here, I’ll suggest a few steps that I think could contribute to better and more consistent practice across the sector in evaluating a wider education skillset.
Some of the issues with evaluating wider skills is simply inconsistency. One person’s ‘grit’ is another’s ‘resilience’ is another’s ‘tenacity’. Some organisations refer to character, others refer to values, still others talk about non-cognitive or social-emotional skills.
My view is that we are unlikely to ever come up with a common terminology that will satisfy everyone. But this shouldn’t stop us arguing for better and more consistent measurement. In the field of well-being, instruments like the Warwick Edinburgh Mental Wellbeing Scale have both been academically tested and widely adopted. Slightly less so for instruments such as the Motivated Strategies for Learning Questionnaire or the Self-Efficacy Questionnaire for Children, although our own work in this area is hopefully contributing towards addressing this.
The point here is that if organisations or schools really want to contribute towards such outcomes being seriously considered, then they need to work together to both use well-tested, validated measures, and to use some of the same ones. The current fragmentation in the system, with hundreds of organisations designing myriads of questionnaires for their own use, makes comparability near impossible and meaningful impact measurement very difficult. Compare this to GCSE grades where, confusion about 1-9 scores aside, basically most people have a common sense of what they mean and can be used for.
This is certainly not going to solve all our measurement problems, and certainly we shouldn’t rule out creating our own measures entirely. In addition, there will always be broader concepts such as creativity which are unlikely to be well suited to translation into one specific measure. But the sector as a whole could benefit hugely from prioritising well-tested existing instruments and working together to create more aligned frameworks.
Too many evaluations rely solely on simple outcome indicators. If you ask someone about whether they were satisfied from being involved in a programme, chances are they will say ‘yes’. And while this might allow us to make great claims with fantastically high percentages of success, how much this really tells us about impact is dubious.
Alongside use of better measures then, I think we could also benefit from a more intelligent approach to looking at multiple indicators. If they all point the same way, they might be telling us something, and if not, that’s of interest and we should investigate why.
Fairly simple examples of this kind of practice include:
At its most sophisticated, this approach can allow for some really powerful analysis. For instance, Kautz and Zanoni make use of a range of administrative data provided by schools to evaluate the OneGoal programme. Because both cognitive and non-cognitive skills affect behaviour and performance, they were able to use standardized test scores to control for cognitive ability, and measure wider skills indirectly through their impact on outcomes such as student grades and absences.
This kind of analysis is likely to be beyond most organisations. But the basic point stands: looking at multiple meaningful metrics together is likely going to be more useful and provide more reliable evidence than analysis of any one outcome alone.
I’ve always liked Daniel Stufflebeam’s phrase that ‘the purpose of evaluation is to improve, not prove.’ Too often we feel under pressure to demonstrate impact, and so we make claims that might not be fully justifiable, or obscure areas of weakness.
When we are evaluating broader skills, we’re stronger when we are transparent. There are limitations to all measures in this domain. There is lots of work to be done developing the evidence base about what works best where. And there are genuine unanswered questions about how skills and knowledge should best interrelate.
As such, we should all be looking to take an approach to impact that is thoughtful, reflective and that acknowledges limitations as well as successes. Street League’s three golden rules are helpful here: don’t overclaim, give absolute numbers as well as percentages, and allow others to audit your evidence.
Impact really matters, as much for wider skills as for academic achievement. If we’re going to invest scare time, money and energy in doing something, we want to be sure it’s making a difference – and if not, how we can change things to do more good in the future.
We’re keen to work more with organisations and schools developing shared approaches to impact measurement, and to work jointly towards some of the recommendations I’ve outlined here. If you’d like to find out more, or are interested in being part of that journey, do get in touch with me on Twitter @od_carter or email us: [email protected]