« Back to Results

Machine Learning and Shrinkage Estimation

Paper Session

Saturday, Jan. 5, 2019 2:30 PM - 4:30 PM

Atlanta Marriott Marquis, A708
Hosted By: Econometric Society
  • Chair: Whitney Newey, Massachusetts Institute of Technology

Nuclear Norm Regularized Estimation of Panel Regression Models

Hyungsik Roger Moon
University of Southern California
Martin Weidner
University College London


In this paper we investigate panel regression models with interactive fixed effects.
We propose two new estimation methods that are based on minimizing convex objective functions. The first estimation method minimizes the sum of squared residuals
with a nuclear (trace) norm regularization. The second estimation method minimizes
the nuclear norm of the residuals. First, we establish the consistency of the two estimators, and then we show how to use these two estimators as a preliminary estimator
and to construct an estimator that is asymptotically equivalent to the QMLE in Bai
(2009) and Moon and Weidner (2017). For this, we propose an iteration procedure and
derive its asymptotic properties.

Adversarial Generalized Method of Moments

Gregory Lewis
Microsoft Research
Vasilis Syrgkanis
Microsoft Research


We provide an approach for learning deep neural net representations of models described via conditional moment restrictions. Conditional moment restrictions are widely used, as they are the language by which social scientists describe the assumptions they make to enable causal inference. We formulate the problem of estimating the underling model as a zero-sum game between a modeler and an adversary and apply adversarial training. Our approach is similar in nature to Generative Adversarial Networks (GAN), though here the modeler is learning a representation of a function that satisfies a continuum of moment conditions and the adversary is identifying violating moments. We outline ways of constructing effective adversaries in practice, including kernels centered by k-means clustering, and random forests. We examine the practical performance of our approach in the setting of non-parametric instrumental variable regression.

(Machine) Learning Parameter Regions

José Luis Montiel Olea
Columbia University
James Nesbit
New York University


Taking random draws from a parameter region in order to approximate it
is a supervised learning problem. Analogous to sampling pixels of an image to
recognize it. Misclassification error—a common machine learning criterion—
provides an off-the-shelf tool to assess the quality of set approximations. We
say a parameter region can be properly approximated—or learned—if there
is an algorithm that yields a misclassification error of at most \epsilon with probability
at least 1-δ, regardless of the sampling distribution. We show that
learning a parameter region is possible if and only if it is not too complex.
Moreover, the tightest band that contains a d-dimensional parameter region is
always learnable from the inside (in a sense we make precise), with at least
ln1/δ(1-epsilon)/epsilon draws, but at most 2d/epsilon ln2d/delta. We illustrate our results
using structural vector autoregressions. We show how many orthogonal matrices
are necessary/sufficient to evaluate the impulse responses’ identified set
and how many ‘shotgun’ plots to report when conducting joint inference on
impulse responses.

Double/De-Biased Machine Learning Using Regularized Riesz Representers

Victor Chernozhukov
Massachusetts Institute of Technology
Whitney Newey
Massachusetts Institute of Technology
James Robins
Harvard University


We provide adaptive inference methods for linear functionals of ℓ1-regularized linear approximations to the conditional expectation function. Examples of such functionals include average derivatives, policy effects, average treatment effects, and many others. The constructionrelies on building Neyman-orthogonalequations that are approximately invariant to perturbations of the nuisance parameters, including the Riesz representer for the linear functionals. We use ℓ1-regularized methods to learn the approximations to the regression function and the Riesz representer, and construct the estimator for the linear functionals as the solution to the orthogonal estimating equations. We establish that under weak assumptions the estimator concentrates in a 1/√n neighborhood of the target with deviations controlled by the normal laws, and the estimator attains the semi-parametric efficiency bound in many cases. In particular, either the approximation to the regression function or the approximation to the Riesz representer can be “dense” as long as one of them is sufficiently “sparse”. Our main results are non-asymptotic and imply asymptotic uniform validity over large classes of models.
JEL Classifications
  • C2 - Single Equation Models; Single Variables
  • C5 - Econometric Modeling