Meta-Analysis and Reproducibility in Economics Research
Sunday, Jan. 8, 2017 8:00 AM – 10:00 AM
Hyatt Regency Chicago, Toronto
- Chair: Edward Miguel, University of California-Berkeley
External Validity in United States Education Research
AbstractAs methods for internal validity improve, methodological concerns have shifted toward assessing how well the research community can extrapolate from individual studies. Under recent federal granting initiatives, over $1 billion has been awarded to education programs that have been validated by a single randomized or natural experiment. If these experiments have weak external validity, scientific advancement is delayed and federal education funding might be squandered. By analyzing trials clustered within interventions, this research describes how well a single study’s results are predicted by additional studies of the same intervention in addition to analyzing how well study samples match the target populations of interventions. I find that U.S. education trials are conducted on samples of students who are systematically less white and more socioeconomically disadvantaged that the overall student population. Moreover, I find that effect sizes tend to decay in the second and third trials of interventions.
Aggregating Distributional Treatment Effects: A Bayesian Hierarchical Analysis of the Microcredit Literature
AbstractThis paper develops methods to aggregate evidence on distributional treatment effects from multiple studies conducted in different settings, and applies them to the microcredit literature. Several randomized trials of expanding access to microcredit found substantial effects on the tails of household outcome distributions, but the extent to which these findings generalize to future settings was not known. Aggregating the evidence on sets of quantile effects poses additional challenges relative to average effects because distributional effects must imply monotonic quantiles and pass information across quantiles. Using a Bayesian hierarchical framework, I develop new models to aggregate distributional effects and assess their generalizability. For continuous outcome variables, the methodological challenges are addressed by applying transforms to the unknown parameters. For partially discrete variables such as business profits, I use contextual economic knowledge to build tailored parametric aggregation models. I find generalizable evidence that microcredit has negligible impact on the distribution of various household outcomes below the 75th percentile, but above this point there is no generalizable prediction. Thus, while microcredit typically does not lead to worse outcomes at the group level, there is no generalizable evidence on whether it improves group outcomes. Households with previous business experience account for the majority of the impact and see large increases in the right tail of the consumption distribution.
Why Economics is Weak and Biased
AbstractThis paper investigates two critical dimensions of the credibility of empirical economics research: statistical power and bias. We survey 64,076 economic estimates from 159 empirical
economics literatures drawn from more than 6,700 empirical studies. Using this extensive
quantitative survey of empirical economics, we calculate statistical power and likely bias.
Taking a ‘conservative’ approach (that is, one prone to over-estimate power), the median of these 159 median powers is no more than 18% and likely closer to 10%. Furthermore, 90% of reported findings are under-powered (relative to the widely-accepted 80% convention) in half of these areas of research, and 20% are comprised entirely of underpowered studies. However, other disciplines are also underpowered. For example, the median power among 14,886 meta-analyses of medical research is only 8% (Turner et al., 2013).
Low power makes economic findings mixed and predictably unreliable. “Not only do underpowered studies lead to a confusing literature but they also create a literature that contains biased estimates of effect” (Maxwell, 2004, p.161). Focus on statistical power leads to a new empirical estimator of effect—the weighted average of the adequately powered (WAAP). WAAP
uses optimal WLS weights and reduces reporting bias without making any assumption about the
cause, distribution or model of selection bias. Lastly, we employ this adequately powered
weighted average to assess the overall magnitude of bias in economics. Typically, reported
economic effects are exaggerated by a factor of two, with one-third inflated by a factor of four or more.
Maxwell, S.E. (2004). ‘The persistence of underpowered studies in psychological research: causes, consequences, and remedies’, Psychological Methods, vol. 9, pp. 147-63.
Turner, R.M., Bird, S.M. and Higgins, J.P.T. (2013) ‘The impact of study size on meta-
analyses: Examination of underpowered studies in Cochrane reviews,’ PLoS ONE 8(3): e59202. doi:10.1371/journal.pone.0059202.
Massachusetts Institute of Technology
Australian National University
University of California-Berkeley
University of California-Berkeley
- A3 - Collective Works
- C1 - Econometric and Statistical Methods and Methodology: General