Dual Interpretation of Machine Learning Forecasts
Abstract
Machine learning predictions are typically interpreted as the sum of contributionsof predictors. Yet, each out-of-sample prediction can also be expressed as a linear
combination of in-sample values of the predicted variable, with weights corre-
sponding to pairwise proximity scores between current and past economic events.
While this dual route leads nowhere in some contexts (e.g., large cross-sectional
datasets), it provides sparser interpretations in settings with many regressors and
little training data—like macroeconomic forecasting. In this case, the sequence
of contributions can be visualized as a time series, allowing analysts to explain
predictions as quantifiable combinations of historical analogies. Moreover, the
weights can be viewed as those of a data portfolio, inspiring new diagnostic mea-
sures such as forecast concentration, short position, and turnover. We show how
weights can be retrieved seamlessly for (kernel) ridge regression, random forest,
boosted trees, and neural networks. Then, we apply these tools to analyze post-
pandemic forecasts of inflation, GDP growth, and recession probabilities. In all
cases, the approach opens the black box from a new angle and demonstrates how
machine learning models leverage history partly repeating itself.