SELECT COMMITTEE FAVOUR PERFORMANCE RELATED PAY
But brush off real practical difficulties
Lessons from the States?
This week’s Education Select Committee report ‘Great teachers: attracting, training and retaining the best’ strongly recommends ‘ that the Department for Education seek to quantify, in a UK context, what scale of variation in teacher value-added equates to in terms of children’s later prospects. We further recommend that the Department develop proposals (based on consultation and a close study of systems abroad) for a pay system which rewards those teachers who add the greatest value to pupil performance. We acknowledge the potential political and practical difficulties in introducing such a system, but the comparative impact of an outstanding teacher is so great that we believe such difficulties must be overcome. (Paragraph 121)
Rewarding outstanding teachers sounds good and fair on the face of it .Teachers who perform well and improve pupils outcomes should be rewarded and incentivised, surely. The Committee though was not wrong when it referred to ‘ practical difficulties’.
If it was accepted that there was one model of measuring the value teachers add, and that this model did so with a considerable degree of accuracy, over time and was absolutely fair and not subject to random results, then there wouldn’t really be big ‘practical difficulties’. But that is not how things stand .Fairly measuring an individual teachers performance is a huge challenge and, yes, there are a number practical problems .
To really understand the issues surrounding performance related pay you have to take a close look at what is happening in the United States. In the US performance related pay is central to education reforms, there is no real agreement among academics on the best and fairest way to measure Value Added . Value-added measures measure the average gains of pupils taught by a given teacher, instructional team, or school. They are often the most important outcomes for performance measurement systems that aim to offer rewards and sanctions focused on teachers performance.
It is worth repeating what the NFER in the UK said in a paper in 1999 when debate on value added was really beginning here in earnest ‘What value added data cannot do is prove anything. Value added evidence is only part of the story of school effectiveness. The notion of a value added measure which tells you – and everyone else – how well your school or department or class is doing, and is also simple to calculate, understand and use, is a non-starter’.
A report for the US Department of Education ‘ Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains’ (July 2011) found that there is ‘evidence that value-added estimates for teacher-level analyses are subject to a considerable degree of random error when based on the amount of data that are typically used in practice for estimation.’ It added that evidence suggests ‘that more than 90 percent of the variation in student gain scores is due to the variation in student-level factors that are not under control of the teacher’ There can be random differences across classrooms in unmeasured factors related to test scores, such as pupils abilities, background factors, and other pupil -level influences and, secondly, what has been described as ‘ idiosyncratic’ unmeasured factors that affect all students in specific classrooms, such as for example a barking dog on the test day, or a particularly disruptive student in the class on the day. Existing research has consistently found that teacher- and school-level averages of student test score gains can be pretty unstable over time. Studies in the States have found only moderate year-to-year correlations—ranging from 0.2 to 0.6—in the value-added estimates of individual teachers (McCaffrey et al. 2009; Goldhaber and Hansen 2008) or small to medium-sized school grade-level teams (Kane and Staiger 2002b). As a result, there are significant annual changes in teacher rankings based on value-added estimates.
Our government has actually stopped collecting what we call ‘contextual value-added‘ data – where the students’ circumstances, social background etc are supposed to be taken into account. So if we don’t know about these background variables, how can we account for them, when measuring performance ,one wonders?
Secondly, it is something of a challenge to disaggregate an individual teachers effect on a pupils performance, from other teachers influence. For example if a pupil has a bad maths teacher, it doesn’t matter how good the physics teacher is-the chances are the pupil will not do so well in physics, and it wont be the physics teachers fault.
Another problem is that in order to measure a pupils progress you have to test pupils regularly. Many believe that either our pupils are over-tested or that teachers are teaching to the test(which is bad) or both. Any performance system will hardly settle these on-going concerns. And, of course, some subjects are not tested, though they are part of a child’s education and are valued. What do you do about the teachers who teach these subjects? Should the tests used to measure teacher performance be based only on external exams? Rather than , say, self-assessment. If self- assessment is being encouraged, which it is in some quarters, wont that put pressure on teachers ,who know that their pay and career depend on positive pupil results, be tempted to cheat or exaggerate?
Unions here and in the US are resistant to performance related pay. Apart from the challenge in designing a system that is both transparent and fair they say that teaching is a collective responsibility. To set teacher against teacher in striving to win extra pay would be destructive of the notion of teamwork so vital to the working of an effective school.
There is also the thorny issue of how you categorise teachers, once you have measured their performance. Do you place them into outcome categories and if so how many? For example, should they be rated highly effective, effective, developing, ineffective, etc. In the United States, many states have already designated four or five categories. Those pushing for a minimum number of outcome categories believe that teacher performance must be adequately differentiated, a goal on which prior systems, most of which relied on dichotomous satisfactory/unsatisfactory schemes, fell short. In other words, the categories in new evaluation systems must reflect the variation in teacher performance, and that cannot be accomplished when there are only a couple of categories. The number of categories a teacher evaluation system employs has to depend on how well it can differentiate teachers performance with a reasonable degree of accuracy.
It may be possible under existing models for measurement to differentiate the performance at the top and bottom of the distribution but is it precise or accurate enough to differentiate clearly between the bulk of teachers in the middle of the distribution? There must be some doubt about this even if you factor in ‘observation’ of teachers work. On this latter point its worth noting that most performance systems rely not just on tests but teacher observation, which advocates of performance related pay claim can offset any in built problems with added value measurements.
It is worth recalling at this juncture what the methodologist Donald T. Campbell said thirty years when he framed what he called a ‘law’ of performance measurement: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”
The report from the Select Committee doesn’t begin to explore the real practical issues and difficulties raised by the issue of performance related pay . It simply suggests with considerable insouciance that they should be overcome. Talk about passing the buck!