Michael Gove, the education secretary, is keen that England measures its education outputs against international benchmarks, particularly the Programme for International Student Assessment (PISA) run by the OECD, and designed by Andreas Schleicher. Gove frequently references Schleicher and, of course, PISA. Less so the other significant benchmark TIMMS.(See below)

However not everyone agrees that PISA  (and the other benchmarks)  is such an authoritative assessment. A 2011  report from the  IPPR  think tank says ‘The sampling methods of international assessments have been criticised for being  too small to reliably judge a whole system’s performance, and for being open to  countries ‘gaming’ the sample by excluding pupils who are likely to perform poorly ‘. (Hormann 2009, Mortimore 2009) and only provide system-level data, which makes it hard to apply the lessons at a more local level.It is also the case that ‘Country-specific factors – including the nature of curriculum, testing and teaching –  can mean some pupils are better prepared for the format of international assessments than others’.

PISA represents an ambitious and expensive large-scale attempt to measure and compare literacy in reading, mathematics and science in a large number of countries. The first PISA survey was launched in 2000, and it has since been followed up with surveys in 2003, 2006 and 2009. Many concerns have been raised concerning the comparability of educational test results from different countries in general and in particular with the difficulties in producing items that are culturally and linguistically neutral.

Professor Stephen Heppell in this country has been a long term critic of its methodology. Prais (2003), Goldstein (2004), Brown (2007), and Hopmann, Brinek & Retzl (2007) have also raised  very specific concerns over the methodology.  Over the last month it has come in for serious criticism from academics, including Svend Kreiner and Hugh Morrison . Kreiners view is that PISA officials claim either that they know about the problems, that the problems have been solved or that they their analyses show that the rankings provided by PISA are robust to the model  errors.  But he counters ‘The truths of such claims are not supported by evidence in the technical reports and our results suggest that the ranking is far from robust. If they want to restore the credibility of their results, it is PISA’s obligation to produce the evidence supporting their claims.’ Morrison says ‘the OECD’s claims in  respect of its PISA project have scant validity given the central dependence of these claims on  the clear separability of ability from the items designed to measure that ability.’

PISA’s comparison of countries relies on plausible student scores derived from the so called Rasch model. The key issue is whether or not the   scaling model used by PISA and the way PISA tests and uses the model for comparison of student scores in different countries is reliable and consistent. In layman’s terms is PISA comparing like with like? Some significant doubts have been raised in this area with the Rasch model criticised. We already know that the ranking system can be misleading as more and more countries join the rating system and some high performers dip in and out. Some countries don’t take part at all. Also statistically insignificant differences between countries performances have been exaggerated and even Schleicher has urged politicians to be cautious in using the evidence to justify policies (they tend to cherry pick and miss important nuances in order to get their basic message –we are failing by international standards- across)

However John Jerrim of the IOE ,who has himself raised concerns over PISA,  says that criticisms that imply its useless as a benchmark  are  a  ‘gross exaggeration’.  While conceding  that a  number of valid points have been raised, and point to various ways in which PISA may be improved (the need for PISA to become a panel dataset – following children throughout school – raised by Harvey Goldstein is a particularly important point, according to Jerrim).  And he accepts that no data or test is perfect, particularly when it is tackling a notoriously difficult task such as cross-country comparisons, and that includes PISA. But he says ‘to suggest it cannot tell us anything important or useful is very far wide of the mark. For instance, if one were to believe that PISA did not tell us anything about children’s academic ability, then it should not correlate very highly with our own national test measures. But this is not the case.’

But there is more profound question that needs to be asked. In order to improve our students performance in PISA we need to make sure that they are being prepared better in classrooms for the things that PISA actually measures.  Is there any evidence that this is the case? Given the concentration on structural reforms in the first half of this Coalition  administration  there is a plausible argument that could be put that the government has spent too much time on the structures and too  little time focusing on what happens in the classroom. Or at least they have left it very late in the day. Structural reforms may play a part in improving outcomes but few believe that they alone can deliver transformative change.

In short, Teaching quality, the curriculum, and assessment  and the overall domestic accountability framework will all impact  on  our students performance in PISA tests and the subsequent rankings.   There is a sneaking suspicion that what changes are being made here are being made too late to have any impact on PISA


So what are the benchmarks?

Programme for International Student Assessment (PISA) PISA is run by the OECD and takes place every three years. It is a sample survey that assesses 15–16 year olds in three areas: literacy, maths and science

Trends in International Mathematics and Science Study (TIMMS) Run by the International Association for the Evaluation of Educational Achievement, TIMMS assesses 9–10 year olds and 13–14 year olds on their skills in both maths and science. TIMMS takes place every three years and more than 50 countries participate. It focuses on curriculum and as a result tends to test pupil’s content knowledge rather than their ability to apply it.

Progress in International Reading and Literacy Study (PIRLS) PIRLS assesses 9–10 year old pupils on their reading literacy. Using a similar design to TIMMS, it focuses on assessing their knowledge and content of the curriculum. It takes place every five years and there are currently 35 countries participating. PIRLS is also run by the International Association for the Evaluation of Educational Achievement.,387514,en.pdf


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s