Machine Learning

Paper Session

Saturday, Jan. 7, 2017 10:15 AM – 12:15 PM

Hyatt Regency Chicago, Water Tower
Hosted By: Econometric Society
  • Chair: Jeff Ely, Northwestern University

Solving Heterogeneous Estimating Equations with Gradient Forests

Susan Athey
Stanford University


Forest-based methods are being used in an increasing variety of statistical tasks, including causal inference, survival analysis, and quantile regression. Extending forest-based methods to these new statistical settings requires specifying tree-growing algorithms that are targeted to the task at hand, and the ad-hoc design of such algorithms can require considerable effort.In this paper, we develop a unified framework for the design of fast tree-growing procedures for tasks that can be characterized by heterogeneous estimating equations. The resulting gradient forest consists of trees grown by recursively applying a pre-processing step where we label each observation with gradient-based pseudo-outcomes, followed by a regression step that runs a standard CART regression split on these pseudo-outcomes.
We apply our framework to two important statistical problems,non-parametric quantile regression and heterogeneous treatment effect estimation via instrumental variables, and we show that the resulting procedures considerably outperform baseline forests whose splitting rules do not take into account the statistical question at hand. Finally, we prove the consistency of gradient forests, and establish a central limit theorem.
Our method will be available as an R-package, gradientForest, which draws from the ranger package for random forests.

Counterfactual Prediction with Deep Instrumental Variables Networks

Matt Taddy
University of Chicago


We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this
growth has been fueled by the success of deep learning architectures: models that map from observables
to outputs via multiple layers of latent representations. These deep learning algorithms are effective
tools for unstructured prediction, and they can be combined in AI systems to solve complex automated
reasoning problems. This paper provides a recipe for combining ML algorithms to solve for causal effects
in the presence of instrumental variables – sources of treatment randomization that are conditionally
independent from the response. We show that a flexible IV specification resolves into two prediction tasks
that can be solved with deep neural nets: a first-stage network for treatment prediction and a second-stage
network whose loss function involves integration over the conditional treatment distribution. This Deep IV
framework imposes some specific structure on the stochastic gradient descent routine used for training,
but it is general enough that we can take advantage of off-the-shelf ML capabilities and avoid extensive
algorithm customization. We outline how to obtain out-of-sample causal validation in order to avoid over-fit.
We also introduce schemes for both Bayesian and frequentist inference: the former via a novel adaptation
of dropout training, and the latter via a data splitting routine.

Approximate Residual Balancing: De-Biased Inference of Average Treatment Effects in High Dimensions

Stefan Wager
Stanford University


There are many settings where researchers are interested in estimating average treatment effects and are willing to rely on the unconfoundedness assumption, which requires that the treatment assignment be as good as random conditional on pre-treatment variables. The unconfoundedness assumption is often more plausible if a large number of pre-treatment variables are included in the analysis, but this can worsen the performance of standard approaches to treatment effect estimation. In this paper, we develop a method for de-biasing penalized regression adjustments to allow sparse regression methods like the lasso to be used for sqrt{n}-consistent inference of average treatment effects. Our method works under substantially weaker assumptions than other methods considered in the literature: Unlike high-dimensional doubly robust methods recently developed in econometrics, we do not need to assume that the treatment propensities are estimable, and unlike existing de-biasing techniques from the statistics literature, our method is not limited to considering sparse contrasts of the parameter vector. Instead, in addition standard assumptions used to make lasso regression on the outcome model consistent under 1-norm error, we only require overlap, i.e., that the propensity score be uniformly bounded away from 0 and 1. Procedurally, our method combines balancing weights with a regularized regression adjustment.

Joint work with Susan Athey and Guido Imbens
JEL Classifications
  • C0 - General