September 12, 2016
Can international exam rankings tell us how to improve American education?
Interpreting cross-country differences can be difficult – and illuminating
Students sitting for a university entrance exam, Belgrade, Serbia, June 2008.
Education experts are looking forward to December this year for an updated report card on where American 15-year-old students stack up against the rest of the world. That’s when the Program for International Student Assessment (PISA) will release results of its 2015 test administered to half a million students across 71 countries.
If the last release three years ago was any indication, the new numbers will prompt a flurry of press releases and hot takes in the media. In 2013, when the results showed the U.S. maintaining its position in the middle of the pack, some fretted that the U.S. wasn’t making up ground against other countries. Education Secretary Arne Duncan called the results “a picture of educational stagnation” and cautioned against complacency.
Teachers unions argued that flatlining test scores were an indication that reform efforts with a heavy emphasis on testing had failed, while reform advocates argued that the results only showed the need to double down. Others warned against overinterpreting the results, which might be less indicative than they seem thanks to socioeconomic disparities between countries and the natural shortcomings of examination design.
An article from this summer’s issue of the Journal of Economic Perspectives surveys the recent research on international education comparisons to try to make sense of the numbers.
In The Importance of School Systems: Evidence from International Differences in Student Achievement (PDF), author Ludger Woessmann argues that test score differentials between countries are important and meaningful, but must be interpreted with appropriate caution.
His previous research appearing in the journal Science found that standardized test scores like the PISA and earlier incarnations can help explain some trends in GDP growth over the last 40 years, including the surprisingly strong growth of the East Asian “Tigers” like South Korea and much slower growth in Latin America. A similar phenomenon can be found across the 50 U.S. states.
Standardized test scores do a better job than traditional measures like years of schooling when trying to explain GDP trends. This is an indication that education quality may be as important as quantity, and that the test score differentials aren’t just an artifact of test-taking ability or culturally biased questions – they seem to be correlated with factors that influence a country’s economic fortunes.
Woessmann also notes that two different international standardized tests, the PISA and a more curriculum-based test called the TIMSS targeted at eighth graders, have roughly the same rank ordering of countries. Of the 28 countries that participated in both tests in 2011 and 2012, average math and science scores were highly correlated between the two tests. This is more assurance that the tests are telling us something real about each country’s student body.
PISA results are broken out for individual students who answer questions about their backgrounds after the test, so researchers can see how scores correlate with factors like parents’ education, immigrant status, and even the number of books in a student’s home. These data are combined with information about the students’ schools and the educational institutions in place in their home districts, like whether exit exams are required for graduating seniors and how teachers’ salaries are decided.
Some striking patterns emerge from simple comparisons using this individual level data. Students with lower parental education and who do not speak the local language at home tend to do worse, which explains a large immigrant-native gap in most countries (one study found that the children of Turkish immigrants outside Turkey tend to do better than Turkish students in Turkey). Private school operation and accountability measures like exit exams and lesson monitoring are associated with better student outcomes.
Access to textbooks and longer instructional days are both linked to student achievement, but direct measures of educational resources like expenditure per student don’t explain very much. Woessmann’s analysis of the 2003 PISA results shows that family background factors and institutional factors seem to explain much more of the cross-country disparities than data on school resources.
The challenge is in figuring out how much we can conclude from these patterns. Many people have been inspired to emulate the examples of the highest-performing countries like Finland, South Korea, and Poland -- an endless stream of education officials from around the world have been dispatched to Finland to examine the highly-touted Finnish school system up close.
But the correlations identified in the data don’t necessarily tell us what would work in the U.S., because the policies in place in each country might reflect something in that country’s values or character that can’t be replicated elsewhere.
Woessmann explains that this evidence isn’t as convincing as the results of a randomized experiment would be, with school systems in different countries assigned to adopt different policies like exit exams, central control of curriculum design, and school choice. If policies were randomly assigned and we still saw major disparities between groups of countries, that would be an excellent indication as to what the best practices were for an educational system.
Unfortunately, massive randomized experiments are quite rare in the world of education, so researchers have to use other methods to make the case that a certain practice or policy is actually the cause of high student achievement, and not the effect of some other factor.
We find that school autonomy has a significant effect on student achievement, but this effect varies systematically with the level of economic and educational development: The effect is strongly positive in developed and high-performing countries but strongly negative in developing and low-performing countries.
Woessmann surveys a few studies that suggest researchers are on to something when they note the importance of education institutions and organizations, although nothing is as rock solid as a transnational experiment. One study took advantage of the ways that instruction time is divided between subjects at different grade levels in the same school to conclude that increased instruction time in a given subject boosted scores on that part of the 2009 PISA test. Other evidence suggests that increased instructional time can help reduce disparities between students with different numbers of books at home.
Another strand of research uses the differential application of exit exams in different parts of Germany and in different subject areas to assess the causal impact of exams on student achievement there, finding causal effects that are smaller than the simple cross-country correlations would indicate, but still sizeable. A newer approach using data on the same students assessed year after year found that student achievement for young students aged 5-8 grew more each year in higher-performing systems like Vietnam.
Just to add to the complexity, Woessmann notes that some institutional features seem to have opposite effects in different types of countries. Giving schools more academic autonomy, for instance, seems to have a beneficial effect in high-income countries but a detrimental one in lower-income countries.
Certain types of countries with more urban populations or more diverse linguistic cultures might benefit from a different set of policies, so even the most rigorous cross-country comparisons don’t necessarily tell us what will work in the U.S. education system. Even so, policymakers shouldn’t be too quick to dismiss the lessons from international standardized test results.
“The Importance of School Systems: Evidence from International Differences in Student Achievement” appears in the Summer 2016 issue of the Journal of Economic Perspectives.