Analysis of Panel and Clustered Data
Saturday, Jan. 4, 2020 8:00 AM - 10:00 AM (PDT)
- Chair: Douglas Steigerwald, University of California-Santa Barbara
Shift-Share Designs: Theory and Inference
AbstractWe study inference in shift-share regression designs, such as when a regional outcome is regressed on a weighted average of observed sectoral shocks, using regional sector shares as weights. We conduct a placebo exercise in which we estimate the effect of a shift-share regressor constructed with randomly generated sectoral shocks on actual labor market outcomes across U.S. Commuting Zones. Tests based on commonly used standard errors with 5% nominal significance level reject the null of no effect in up to 55% of the placebo samples. We use a stylized economic model to show that this overrejection problem arises because regression residuals are correlated across regions with similar sectoral shares, independently of their geographic location. We derive novel inference methods that are valid under arbitrary cross-regional correlation in the regression residuals. We show that our methods yield substantially wider confidence intervals in popular applications of shift-share regression designs.
Robust Semiparametric Estimation in Panel Multinomial Choice Models
AbstractThis paper proposes a simple yet robust method for semiparametric identification and estimation in panel multinomial choice models, where we allow for infinite dimensional fixed effects in the presence of additive nonseparability, thus incorporating rich forms of unobserved heterogeneity. Our identification strategy exploits the standard notion of multivariate monotonicity in its contrapositive form, which provides powerful leverage for converting observable events into identifying restrictions on unknown parameters. Specifically, we show how certain configurations of conditional choice probabilities preserve weak monotonicity in an index vector, despite the presence of infinite-dimensional nuisance parameters. Then, by taking the logical contraposition of an intertemporal inequality on conditional choice probabilities from two time periods, we obtain an identifying restriction on the index values. Based on our identification result, we construct consistent set (or point) estimators, together with a computational algorithm adapted to the challenges of this framework. The first step of our two-stage procedure nonparametrically estimates a collection of inequalities concerning intertemporal differences in conditional choice probabilities, where we adopt a machine learning algorithm using artificial neural networks. In the second stage, we compute the final estimator as the minimizers of our sample criterion function. Here, we adopt a spherical-coordinate reparameterization to exploit a combination of topological, geometric and computational advantages. The estimated model is then shown to be further utilizable for counterfactual analysis, such as predicting the effect of a promotional campaign on product sales. We conduct a simulation study to analyze the finite-sample performance of our method and the adequacy of our computational procedure for practical implementation. We then apply our procedure to the Nielsen data on popcorn sales to explore the effects of marketing promotion effects. In our model, we permit rich unobserved heterogeneity in factors such as brand loyalty or responsiveness to subtle flavor and packaging designs, which may affect choices in complex ways. The results show that our procedure produces estimates that conform well with economic intuition. For example, we find that special in-store displays boost sales not only through a direct promotion effect but also through the attenuation of consumers’ price sensitivity.
The Wild Bootstrap with a “Small” Number of “Large” Clusters
AbstractThis paper studies the properties of the wild bootstrap-based test proposed in Cameron et al. (2008) in settings with clustered data. Cameron et al. (2008) provide simulations that suggest this test works well even in settings with as few as five clusters, but existing theoretical analyses of its properties all rely on an asymptotic framework in which the number of clusters is “large." In contrast to these analyses, we employ an asymptotic framework in which the number of clusters is “small," but the number of observations per cluster is “large." In this framework, we provide conditions under which the limiting rejection probability of an un-Studentized version of the test does not exceed the nominal level. Importantly, these conditions require, among other things, certain homogeneity restrictions on the distribution of covariates. We further establish that the limiting rejection probability of a Studentized version of the test does not exceed the nominal level by more than an amount that decreases exponentially with the number of clusters. We study the relevance of our theoretical results for finite samples via a simulation study.
Inference for Dependent Data with Cluster Learning
AbstractThis paper proposes a cluster-based inferential procedure. Observations are grouped into clusters which are learned using a unsupervised learning algorithm given a dissimilarity measure. We consider a set of cluster-based inference procedure on the learned clusters. We give conditions under which our procedure asymptotically attains correct size. We illustrate the finite sample validity and apply our procedure to an empirical example.
Testing for Treatment Effects in Randomized Control Trials: The Effect of Differing Cluster Sizes
AbstractWe consider a common situation where we have binary responses to compare between two samples, but the observations are not independent. The dependence structure is such that there are a number of independent clusters, but that the observations within each cluster are dependent. Sandwich-type variance estimators that are based entirely on the variability between the statistics for each cluster are robust to this kind of dependence. We establish a set of sufficient conditions to imply that the variance estimators are consistent, and therefore there is a test statistic that has a standard normal distribution in the limit. The conditions require that the number of clusters goes to infinity and that there is enough homogeneity that no one cluster dominates the calculation.
- C2 - Single Equation Models; Single Variables
- C3 - Multiple or Simultaneous Equation Models; Multiple Variables