Aid donors and governments want sound evidence on the impacts of their spending on poverty and human development. The design and implementation of sound evaluations of impact can be costly, so they have been relatively rare in developing countries and, until recently, in World Bank programs. Instead, development policy has often been guided by methodologically flawed evaluations, anecdotal evidence, or development fashions.
Impact evaluations can make aid more effective
Impact evaluations are about understanding what works in development, under what circumstances, and why. They do not simply measuring inputs—such as how many school meals were delivered in a school feeding program. Instead, they assess impacts on key development objectives, such as children’s nutritional status. By determining who gains from a program and by how much, impact evaluations can increase the effectiveness of aid.
Impact evaluations contribute to a culture of accountability and results-based lending
Impact evaluations help foster a culture of accountability and results-based lending within the Bank and within client governments in developing countries. They cannot be used to evaluate the effects of every program or intervention—especially, where a policy is economy wide. But credible impact evaluations should gradually become a central design element of effective policies.
Impact evaluations are essential for creating a “knowledge bank”
The number and quality of impact evaluations are gradually improving, and research is moving on to bigger questions. These include understanding why the same intervention can be successful in some settings but not in others, and how to make policies more effective. Impact evaluations allow credible cost-effectiveness comparisons across policies. For example, are learning outcomes best improved through reducing poverty, building schools and other education-based policies, improving early childhood development through health interventions, or building rural roads?
What we know
Designing convincing impact evaluations involves constructing a credible comparison group
One reason rigorous evaluations have been rare is that measuring the impact of policies is difficult. Impact evaluation aims to compare outcomes for individuals who participate in a program with the outcomes with an unobservable alternative: the outcomes they would have obtained if they had not participated in the program. Because they cannot be observed, an outcome for participants in the absence of an intervention requires a proxy for a “comparison” group of similar individuals who did not receive the program. But any differences in characteristics between the two can result in misleading inferences about the program’s impacts. Two sets of techniques (often combined in practice) can be used to minimize such differences and construct ideal comparison groups for rigorous evaluations:
- Experimental methods: Randomization involves selecting the beneficiaries of an intervention through a lottery. As in medical trials, potential beneficiaries are randomly divided into two groups—one receiving a program, another not. This is appealing because it effectively removes all differences between recipients and non-recipients, so that in sufficiently large samples the differences in outcomes between the two can be credibly interpreted as the impact of the program (rather than any confounding factors).
- Non-experimental methods: Randomization is not always feasible, particularly when working with government programs. And randomized evaluations can be deceptive about the gains from scaling up. With good data, it is possible to assess impacts without randomization. These quasi-experimental methods aim to remove all observable differences between beneficiaries and the comparison group of non-beneficiaries. If planning for the evaluation begins early, it is possible to collect baseline and post-intervention data on participants and non-participants, and a number of econometric methods can be used to correct for differences between these two groups. For instance, participants can be carefully matched with non-participants on the basis of observable, pre-intervention characteristics. Alternatively, changes over time in outcomes for non-random samples of participants and non-participants can be compared.
Impact evaluations require planning and careful design at an early stage of a project, extensive coordination, and a substantial budget
To be rigorous and convincing, impact evaluations require careful design, generally before an intervention is put in place. They require coordination between researchers, others involved in project preparation, and government counterparts. They also require careful supervision to ensure that the evaluation design is respected during program implementation. And they may require sizeable budgets (for baseline and follow-up surveys). These are prerequisites for successful evaluations—even if they cannot assure that every evaluation will deliver the expected results.
Credible, rigorous impact evaluations have large returns because they advance our knowledge about the effects of policies, have lessons for the design of interventions, and help ensure the political sustainability of successful programs.
Credible evaluations highlight what interventions work and why
In Ecuador and Mexico, cash transfers are made to poor households subject to a means-test. In both countries, random assignment among eligible communities and families was used during the roll-out of the program, and this forms the basis of evaluations of the effect of these programs on the cognitive development of children of pre-school age. Both evaluations showed large, significant effects of cash transfers on cognitive development, in particular among the poorest households. Low levels of cognitive development in early childhood have very serious consequences for school performance and labor market outcomes, and these evaluations show that carefully targeted cash transfers are one type of effective intervention.
An evaluation of the impacts on child health of piped water in India found lower prevalence and duration of diarrhea for children living in households with piped water as compared to a matched comparison group of households. Yet, it also found striking differences in the child health gains according to family income and adult female education. The health gains largely bypass children in poor families, particularly when the mother is poorly educated. These findings highlight the importance of combining public investments of this type of infrastructure with other interventions in education and income poverty reduction.
In Bangladesh, families who own more than one-half acre of land are ineligible for most group-based microcredit programs. This rule was used to compare the outcomes of families with “just under” and “just above” the maximum amount of land holdings. The evaluation showed large, positive effects of microcredit on household welfare. The results underscored the negative effects of credit constraints in developing countries and the potential of microcredit programs as a way of alleviating these constraints.
In Burkina Faso, the impact of alternative approaches to school feeding was evaluated using a prospective randomized trial. School meals where students are provided with lunch each school day and take-home rations that provide girls with 10 kg of cereal flour each month, conditional on 90 percent attendance rate, both led to increased enrollment for girls of about 5 percentage points. Absenteeism fell in families which had a large child labor supply, but increased in households with fewer children. Take-home rations had positive spillover effects: younger siblings of beneficiaries, aged between 12 and 60 months, showed an increase in their weight-for-age and weight-for-height. In contrast, school meals had no significant impact on the nutrition of younger children.
In Colombia and Cambodia, demand-side incentives such as school vouchers and scholarships have been shown to have large, positive effects on enrollment in lower secondary school. In Colombia, children who received the voucher were 10 percentage points more likely to finish 8th grade, and were also more likely to perform well on tests of academic achievement. In Cambodia, girls from poor families who received scholarships were more than 20 percentage points more likely to attend lower secondary school than they would have been in the absence of the program.
Chile Solidario, an innovative social protection system in Chile, targets households in extreme poverty and provides them with a two-year period of psycho-social support from a local social worker. During this phase the households are ensured preferential access to a system of monetary transfers and a system of social programs that meet their needs in terms of human capital endowments, housing, and income generation capacity. The evaluation exploits an exogenous geographic variation in the assignment of the program to find that the program improved education and health outcomes and increased take-up of cash assistance and social programs for housing and employment. The study provides suggestive evidence of the key role that close follow-up and psycho-social support has in enabling households to orient toward the future and make the best use of public programs and services.
In China, international and local researchers have collaborated to study the long-term impact of a major Bank-lending operation aimed at dramatically reducing absolute poverty in one of China’s poorest rural regions using multi-sectoral interventions (including farming, animal husbandry, infrastructure, and social services). Only modest gains to mean consumption emerged in the longer term, but certain types of households gained more than others. The educated poor were under-covered by the community-based selection process— greatly reducing overall impact.
In Mexico and Sri Lanka, experimental evaluations of the impact of grants to small firms suggest that these firms operate under substantial constraints in access to capital. In Mexico the grants generated large increases in profits, with the effects concentrated among firms that were more financially constrained. The estimated return to capital was at least 20 to 33 percent a month—three to five times higher than market interest rates. In Sri Lanka the average real returns to the grants were also substantially higher than market interest rates. These returns varied with entrepreneurial ability, household wealth, and concentrated among firms owned by men. Impacts did not vary with measures of risk aversion or uncertainty.
Credible evaluations can influence public policies
In Mexico PROGRESA, a conditional cash transfer program, made transfers to households if preschool children were taken for regular visits to health centers and school-aged children were enrolled in school. The program, introduced gradually among eligible communities used a lottery to determine what areas would receive the program first. The randomized phase-in allowed researchers to convincingly show that PROGRESA improved child schooling and health. The PROGRESA evaluation resulted in the design of similar programs in almost every country in Latin America.
In China, it is estimated that the poverty head count would be nearly 15 percent lower in the absence of out-of-pocket spending for health. However, an evaluation of an insurance scheme for catastrophic health shocks in urban areas showed no effect of the program on health spending. Part of the problem appeared to be that providers were paid on a fee-for-service basis, which encouraged them to shift insured patients from basic care to high-tech, high-margin services and drugs. To discourage providers from over-providing care, a World Bank-supported project in rural areas operated on both the demand-side by making the village-based rural health insurance system more attractive through subsidized premiums, and on the supply-side, by introducing treatment protocols, drug lists, and training programs to reduce “demand inducement” by providers. An evaluation showed that the project significantly reduced catastrophic health spending.
In Cambodia, evaluation of a scholarship program aimed at increasing access and participation in lower secondary school found that there were diminishing returns to larger transfers, which led the administrators to expand coverage rather than the amount of the transfer during scale-up. In addition, the evaluation highlighted that many of the poorest children had dropped out of school prior to the secondary level, which led the government to pilot (and evaluate) a similar program targeted at the primary level.
In Argentina, matching techniques were used to estimate the impact of a program that provided support to families that had become unemployed during the severe economic crisis of 2002. The evaluation showed that the program reduced aggregate unemployment and extreme poverty. Together with earlier research on the impact of workfare programs, this evaluation has helped governments develop appropriate policies during macroeconomic shocks.
Credible evaluations help ensure political sustainability
The rigorous nature of the PROGRESA evaluation was one of the reasons for the administration of President Fox to support a program inherited from the previous administration—an unusual outcome. The name of the program was changed (to Oportunidades), but the design was essentially unchanged. The program has now continued with the administration of President Calderon. Much the same happened in Colombia, where a rigorously evaluated conditional cash transfer program survived an administration change—in part, because of the results of a credible impact evaluation.
Current and future research on impact evaluation
Increasing the number and quality of evaluations of similar interventions in different circumstances will help make policies more effective
The development community has demanded better evaluation as an input to results-based aid. The World Bank—with its global scope, unparalleled access to policy makers at the highest levels, and large volumes of lending for specific interventions—is ideally placed to encourage learning about development from impact evaluations. Much can be learned from the accumulation of a large number of evaluations, on similar interventions, carried out in different settings. The results can then be aggregated into a review of what can be generalized about the circumstances under which policies work best, and the types of policies and design that best achieve a given gain for a given cost.
The World Bank aims to bring together the lessons learned from these evaluations. The recently published Policy Research Report Conditional Cash Transfers: Reducing Present and Future Poverty reviews the theory and evidence on the effectiveness of programs that transfer cash to families who comply with conditions related to health and education behaviors. Reviews of other types of interventions are planned for upcoming years—as the volume of evaluations on a particular topic warrants a synthesis.
Increasing the number and quality of evaluations will help our understanding of the techniques that are appropriate in different circumstances
Building up the body of evaluations should also shed light about evaluation methods. Randomization may be the most convincing evaluation design, but it is not always possible or desirable. Understanding what other techniques work—under what circumstances, and what their biases are—is an important direction for future research.
Examples of ongoing impact evaluations in the World Bank’s research group include the following:
- What policies work best to extend care and support to AIDS patients? The HIV-AIDS epidemic is devastating the social fabric in many Sub-Saharan African countries. In addition to the human costs of the epidemic, the economic consequences are dire—families are condemned to poverty when breadwinners die, and communities struggle to cope with the burden of patients and orphaned children. A series of evaluations of the impact of providing antiretroviral drugs are being conducted in Burkina Faso, Ghana, Kenya, Mozambique, Rwanda, and South Africa.
- Can increased school-based management improve learning outcomes? It is often argued that decentralizing decision making in education from central ministries to school-based local management committees of parents and school administrators can result in improvements in learning outcomes. An ongoing randomized evaluation of school-based management in Pakistan directly tests the impact of policies that transfer responsibilities to local management committees.
- Do new roads linking poor isolated rural areas to the outside world promote local development and higher living standards? Despite a general consensus on the importance of rural roads, there is surprisingly little hard evidence on the size and nature of these benefits and on the contextual factors that influence outcomes. Ongoing evaluations in both Vietnam and Bangladesh examine these issues, focusing on the heterogeneity in impacts, and the nature of that heterogeneity, including the interactions with geographical, community, and household characteristics.
- To what extent can school fee reductions improve learning outcomes? The extent to which fees paid by households are a significant deterrent to learning outcomes in developing countries is a question of great policy importance. An ongoing quasi-experimental evaluation in Lesotho studies the impact of a nationwide policy that eliminated most fees at the primary school level.
- What are the gains from iron deficiency treatments on health and productivity outcomes? Micronutrient deficiencies, widespread in many developing countries, can have very large costs in terms of health, schooling, and wage-earning potential. An ongoing randomized evaluation in Indonesia tests the effect of iron supplementation on health, productivity, and wages.
Challenges in making impact evaluations more useful to policy makers
Looking forward, there is much to be done in assuring that the tools of impact evaluation are effectively mobilized to address the persistent gaps between what we know about development effectiveness and what we want to know. These knowledge gaps stem from distortions in the market for knowledge, arising from the fact that a large share of the benefits from an evaluation accrue to people not directly involved in the decision on whether to do the evaluation and how much to invest in it. Well-designed evaluations can help address these knowledge gaps and so be more relevant to the needs of practitioners. Support from governments and donors will generally be needed to assure that socially optimal investments are made in knowledge generation through evaluation.
Making evaluations more relevant to the needs of practitioners will also require that researchers and evaluators:
- Strive to identify the most policy-relevant questions (including the case for intervention) and make sure those questions drive the agenda for evaluative research.
- Take a broader approach to the problems of internal validity, including heterogeneity of impacts across program participants and spillover effects, such that the comparison group no longer represents the counterfactual outcomes in the absence of the program.
- Strive to address the problems of external validity, notably how one can best learn from an evaluation of one or a few programs about what will happen when the intervention is scaled up or applied to a different setting.
Contact: Deon Filmer, email@example.com, 202-473-1303
Most World Bank research documents cited in this summary are available through the World Bank’s research archives at http://econ.worldbank.org/docsearch or the Bank-wide archives at http://www-wds.worldbank.org/. The word “processed” describes informally reproduced works that may not be commonly available through library systems.
1. For an overview of the theory and methods of impact evaluation see M. Ravallion. 2008. “Evaluating Anti-Poverty Programs.” In Handbook of Development Economics Volume 4, eds., P. Schultz and J. Strauss. Amsterdam: North-Holland.
2. On Ecuador:
C. Paxson and N. Schady. 2007. “Cognitive Development among Young Children in Ecuador: The Roles of Wealth, Health, and Parenting.” Journal of Human Resources 42(1): 49–84.
C. Paxson and N. Schady. 2007. “Does Money Matter? The Effect of Cash Transfers on Child Health and Development in Rural Ecuador.” Policy Research Working Paper 4226, World Bank, Washington, DC.
L. Fernald, P. Gertler, and L. Neufeld. 2006. “How Important is the Amount of Cash in Conditional Cash Transfer Programs for Child Development?” University of California at Berkeley, processed.
3. J. Jalan and M. Ravallion. 2003. “Does Piped Water Reduce Diarrhea for Children in Rural India?” Journal of Econometrics 112: 153–73.
4. M. M. Pitt and S. Khandker. 1998. “The Impact of Group-Based Credit Programs on Poor Households in Bangladesh: Does the Gender of Participants Matter?” Journal of Political Economy 106(5): 958–96.
For a different perspective see J. Morduch. 1999. “The Microfinance Promise.” Journal of Economic Literature 37(4): 1569–1614.
5. H. Kazianga, D. de Walque, and H. Alderman. 2009. “Educational and Health Impacts of Two School Feeding Schemes: Evidence from a Randomized Trial in Rural Burkina Faso.” Policy Research Working Paper 4976, World Bank, Washington, DC.
6. On Colombia: J. Angrist, E. Bettinger, E. Bloom, E. King, and M. Kremer. 2002. “Vouchers for Private Schooling in Colombia: Evidence from a Natural Randomized Experiment.” American Economic Review 92(5): 1535–58.
D. Filmer, and N. Schady. 2007. “Getting Girls Into School: Evidence from a Scholarship Program in Cambodia.” Economic Development and Cultural Change 56(3): 581–617.
7. E. Galasso. 2006 “With Their Effort and One Opportunity: Alleviating Extreme Poverty in Chile.” Development Research Group, World Bank, Washington, DC, processed.
8. S. Chen, R. Mu and M. Ravallion. 2006 “Are There Lasting Impacts of Aid to Poor Areas? Evidence from Rural China.” Policy Research Working Paper 4084, World Bank, Washington, DC.
9. S. de Mel, D. McKenzie, and C. Woodruf. 2009. “Returns to Capital in Microenterprises: Evidence from a Field Experiment.” Quarterly Journal of Economics 123(4):1329–72.
D. McKenzie and C. Woodruf. 2008. “Experimental Evidence on Returns to Capital and Access to Finance in Mexico.” World Bank Economic Review 22(3): 457–82.
10. P. Gertler. 2004. “Do Conditional Cash Transfers Improve Child Health? Evidence from PROGRESA’s Control Randomized Experiment.” American Economic Review 94(2, Papers and Proceedings): 336–41.
T. P. Schultz. 2004. “School Subsidies for the Poor: Evaluating the Mexican PROGRESA Poverty Program.” Journal of Development Economics 74(1): 199–250.
11. A. Wagstaff and S. Yu. 2007. “Do Health Sector Reforms have their Intended Impacts? The World Bank's Health VIII Project in Gansu Province, China.” Journal of Health Economics 26(3): 505–35.
12. D. Filmer and N. Schady. 2009. “Are There Diminishing Returns to Transfer Size in Conditional Cash Transfers?” Policy Research Working Paper 4999, World Bank, Washington, DC.
D. Filmer and N. Schady. 2009. “School Enrollment, Selection, and Test Scores.” Policy Research Working Paper 4998, World Bank, Washington, DC.
13. E. Galasso and M. Ravallion. 2004. “Social Protection in a Crisis: Argentina’s Plan Jefes y Jefas.” World Bank Economic Review 18(3): 367–99.
14. A. Fiszbein and N. Schady. 2009. Conditional Cash Transfers: Reducing Present and Future Poverty. Policy Research Report. Washington, DC: World Bank.
15. For further discussion see M. Ravallion. 2009. “Evaluation in the Practice of Development.” World Bank Research Observer 24(1): 29-54.