Machine Learning in Econometrics

Paper Session

Friday, Jan. 6, 2017 10:15 AM – 12:15 PM

Hyatt Regency Chicago, Crystal B
Hosted By: American Economic Association
  • Chair: Victor Chernozhukov, Massachusetts Institute of Technology

Double Machine Learning: Improved Point and Interval Estimation of Treatment and Causal Parameters

Victor Chernozhukov
,
Massachusetts Institute of Technology

Abstract

Most supervised machine learning (ML) methods are explicitly designed to solve prediction problems very well. Achieving this goal does not imply that these methods automatically deliver good estimators of causal parameters. Examples of such parameters include individual regression coefficients, average treatment effects, average lifts, and demand or supply elasticities. In fact, estimates of such causal parameters obtained via naively plugging ML estimators into estimating equations for such parameters can behave very poorly, for example, by formally having inferior rates of convergence with respect to the sample size n caused by regularization bias. Fortunately, this regularization bias can be removed by solving auxiliary prediction problems via ML tools. Specifically, we can form an efficient score for the target low-dimensional parameter by combining auxiliary and main ML predictions. The efficient score may then be used to build an efficient estimator of the target parameter which typically will converge at the fastest possible 1/root(n) rate and be approximately unbiased and normal, and from which valid confidence intervals for these parameters of interest may be constructed. The resulting method thus could be called a ``double ML" method because it relies on estimating primary and auxiliary predictive models. Such double ML estimators achieve the fastest rates of convergence and robustness of behavior with respect to a broader class of probability distributions than naive "single" ML estimators. We illustrate the use of the proposed methods with an application to estimating the effect of 401(k) eligibility on accumulated assets.

Testing-Based Forward Model Selection

Damian Kozbur
,
ETH Zurich

Abstract

This work introduces a theoretical foundation for a procedure called `testing-based forward model selection' in regression problems. Forward selection is a general term referring to a model selection procedure which inductively selects covariates that add predictive power into a working statistical model. This paper considers the use of testing procedures, derived from traditional statistical hypothesis testing, as a criterion for deciding which variable to include next and when to stop including variables. Probabilistic bounds for prediction error and number of selected covariates are proved for the proposed procedure. The general result is illustrated by an example with heteroskedastic data where Huber-Eicker-White standard errors are used to construct tests. The performance of the testing-based forward selection is compared to Lasso and Post-Lasso in simulation studies. Finally, the use of testing-based forward selection is illustrated with an application to estimating the effects of institution quality on aggregate economic output.

L2-Boosting for Economic Applications

Martin Spindler
,
University of Hamburg and Max Planck Society
Ye Luo
,
University of Florida

Abstract

Boosting is one of the most significant developments in machine learning. This paper studies the statistical properties of L2-Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called "post-Boosting". This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by L2-Boosting. Another variant is orthogonal boosting where after each step an orthogonal projection is conducted. We analyse those variants and apply them to economic applications like IV estimation and treatment effect estimation in a high-dimensional setting. We derive results for inference in those settings and highlight the performance of Boosting in real applications and compare the results to Lasso.

Core Determining Class: Construction, Approximation and Inference

Ye Luo
,
University of Florida
Hai Wang
,
Singapore Management University

Abstract

The relations between unobserved events and observed outcomes in partially identified models can be characterized by a bipartite graph. We estimate the probability measure on the events given observations of the outcomes based on the graph. The feasible set of the probability measure on the events is defined by a set of linear inequality constraints. The number of inequalities is often much larger than the number of observations. The set of irredundant inequalities is known as the Core Determining Class. We propose an algorithm that explores the structure of the graph to construct the exact Core Determining Class when data noise is not taken into consideration. We prove that if the graph and the measure on the observed outcomes are non-degenerate, the Core Determining Class does not depend on the probability measure of the outcomes but only on the structure of the graph. For more general problem of selecting linear inequalities under noise, we investigate the sparse assumptions on the full set of inequalities, i.e., only a few inequalities are truly binding. We show that the sparse assumptions are equivalen to certain sparse conditions on the dual problems. We propose a procedure similar to the Dantzig Selector to select the truly informative constraints. We analyze the properties of the procedure and show that the feasible set defined by the selected constraints is a nearly sharp estimator of the true feasible set. Under our sparse assumptions, we prove that such a procedure can significantly reduce the number of inequalities without throwing away too much information. We apply the procedure to the Core Determining Class problem and obtain a stronger theorem taking advantage of the structure of the bipartite graph.

Estimating Average Treatment Effects in Settings with Many Covariates: Supplementary Analyses and Remaining Challenges

Susan Athey
,
Stanford University
Guido Imbens
,
Stanford University
Thai Pham
,
Stanford University
Stefan Wager
,
Stanford University

Abstract

There is a large literature in econometrics and statistics on semiparametric estimation of average treatment effects under the assumption of unconfounded treatment assignment. Recently this literature has focused on the setting with many covariates where regularization of some kind is required. In this article we discuss some of the lessons from the earlier literature and their relevance for the many covariate setting.
Discussant(s)
Panagiotis Toulis
,
University of Chicago
Adam M. Rosen
,
University College London
Hai Wang
,
Singapore Management University
JEL Classifications
  • C1 - Econometric and Statistical Methods and Methodology: General
  • C3 - Multiple or Simultaneous Equation Models; Multiple Variables