But brush off real practical difficulties

Lessons from the States?


This week’s Education  Select Committee report  ‘Great teachers: attracting, training and retaining the best’   strongly recommends ‘ that the Department for Education seek to quantify, in a UK context, what scale of variation in teacher value-added equates to in terms of children’s later prospects. We further recommend that the Department develop proposals (based on consultation and a close study of systems abroad) for a pay system which rewards those teachers who add the greatest value to pupil performance. We acknowledge the potential political and practical difficulties in introducing such a system, but the comparative impact of an outstanding teacher is so great that we believe such difficulties must be overcome. (Paragraph 121)

Rewarding outstanding teachers sounds good and fair  on the face of it .Teachers who perform well and improve pupils outcomes should be rewarded and incentivised, surely. The Committee though was  not wrong when  it  referred to ‘ practical difficulties’.

If it was accepted that  there was  one model of measuring  the value teachers add,  and  that this model   did so with a considerable degree of accuracy, over time and was absolutely fair and not subject to random results, then  there wouldn’t really be  big   ‘practical difficulties’. But that is not how things stand .Fairly measuring an individual teachers performance is a  huge challenge and, yes, there are a number  practical problems .

To really understand the issues surrounding performance related pay you have to take a close look at what is happening in the United States. In the US  performance related pay is central to education reforms, there is no real agreement among academics on the best and fairest way to measure  Value Added . Value-added measures measure  the average gains of pupils taught by a given teacher, instructional team, or school. They are often the most important outcomes for performance measurement systems that aim to offer rewards and sanctions  focused on  teachers performance.

It is worth repeating what the NFER in the UK said  in a paper in 1999 when debate on  value added was really beginning here  in earnest ‘What value added data cannot do is prove anything. Value added evidence is only part of the story of school effectiveness. The notion of a value added measure which tells you – and everyone else – how well your school or department or class is doing, and is also simple to calculate, understand and use, is a non-starter’.

A report for the US Department of Education ‘ Error Rates in Measuring  Teacher and School Performance  Based on Student Test Score  Gains’ (July 2011) found  that  there is  ‘evidence that value-added estimates for teacher-level analyses are subject to a considerable degree of random error when based on the amount  of data that are typically used in practice for estimation.’   It  added  that evidence suggests  ‘that more than 90 percent  of the variation in student gain scores is due to the variation in student-level factors that are not under control of the teacher’ There can be random differences across classrooms in unmeasured factors related to test scores, such as pupils  abilities, background factors, and other pupil -level influences and, secondly, what has been described as ‘ idiosyncratic’  unmeasured factors that affect all students in specific classrooms, such as for example  a barking dog on the test day,  or  a particularly disruptive student in the class on the day. Existing research has consistently found that teacher- and school-level averages of student test score gains  can be pretty  unstable over time. Studies in the States  have found only moderate year-to-year correlations—ranging from 0.2  to 0.6—in the value-added estimates of individual teachers (McCaffrey et al. 2009; Goldhaber and  Hansen 2008) or small to medium-sized school grade-level teams (Kane and Staiger 2002b). As a result, there are significant annual changes in teacher rankings based on value-added estimates.

Our government has actually stopped collecting what we call ‘contextual value-added‘ data – where the students’ circumstances, social background etc are supposed to be  taken into account. So if we don’t know about  these background variables, how can we account for them, when measuring performance ,one wonders?

Secondly, it is something of a challenge to disaggregate  an individual teachers effect on a pupils performance, from other teachers influence. For example  if a pupil has a bad maths teacher, it doesn’t matter how good the physics teacher is-the chances are the pupil  will not do so well in physics, and it wont be the physics  teachers fault.

Another problem is that in order to measure a pupils progress you have to test pupils  regularly. Many believe that either our pupils are over-tested or that teachers are teaching to the test(which is bad) or both. Any performance system will hardly settle these on-going concerns.  And, of course, some subjects are not tested, though they  are part of  a child’s education and are valued. What do you do about the teachers who teach these subjects? Should the tests used to measure teacher performance be based only on external exams?  Rather than , say, self-assessment. If self- assessment is being encouraged, which it is in some quarters, wont that put pressure on teachers ,who know that their pay and career depend on positive pupil  results, be tempted to cheat or exaggerate?

Unions here and in the US are resistant to performance related pay. Apart from the challenge in   designing  a system that is both  transparent and fair   they say  that teaching is a collective responsibility. To set teacher against teacher in striving to win extra pay would be destructive of the notion of teamwork so vital to the working of an effective school.

There is also the thorny issue of how you categorise teachers, once you have measured their performance. Do you place them into outcome categories and if so how many?  For example, should they be rated highly effective, effective, developing, ineffective, etc. In the United States, many states have already designated four or five categories. Those pushing for a minimum number of outcome categories believe that teacher performance must be adequately differentiated, a goal on which prior systems, most of which relied on dichotomous satisfactory/unsatisfactory schemes, fell short. In other words, the categories in new evaluation systems must reflect the variation in teacher performance, and that cannot be accomplished when there are only a couple of categories. The number of categories a teacher evaluation system employs has to depend  on how well it can differentiate teachers performance  with a reasonable degree of accuracy.

It may be possible under existing models for measurement to differentiate the performance at the top and bottom of the distribution but is it precise or accurate enough to differentiate clearly   between the bulk of teachers in the middle of the distribution? There must be some doubt about this even if you factor in ‘observation’ of teachers work. On this latter point its worth noting that most performance systems rely not just on tests but teacher observation, which advocates of performance related pay claim can offset any in built problems with added value measurements.

It is worth recalling  at this juncture  what  the methodologist Donald T. Campbell said thirty years  when he framed  what he called a ‘law’ of performance measurement: “The more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.”

The report from the Select Committee doesn’t begin to explore the real practical issues  and difficulties raised by the issue of  performance related pay . It  simply suggests with considerable insouciance  that they should be overcome. Talk about passing the buck!



