Asset Embeddings
Abstract
Firm characteristics are ubiquitously used in economics. These characteristics are oftenbased on readily-available information such as accounting data, but those reflect only a part
of investors’ information set. We show that useful information about firm characteristics is
embedded in investors’ holdings data and, via market clearing, in prices, returns, and trading
data. Based on insights from the recent artificial intelligence (AI) and machine learning (ML)
literature, in which unstructured data (e.g., words or speech) are represented as continuous
vectors in a potentially high-dimensional space, we propose to learn asset embeddings from
investors’ holdings data. Indeed, just as documents arrange words that can be used to uncover
word structures via embeddings, investors organize assets in portfolios that can be used to
uncover firm characteristics that investors deem important via asset embeddings. This broad
theme provides a natural bridge to connect recent advances in the fields of AI and ML to finance
and economics. Specifically, we show how language models, including transformer models that
feature prominently in large language models such as BERT and GPT, can handle numerical
information, and in particular holdings data to estimate asset embeddings. We provide initial
evidence on the value added of asset embeddings through a series of applications in the con-
text of firm valuations, return comovement, and uncovering asset substitution patterns. As a
by-product, the models generate investor embeddings, which can be used to measure investor
similarity. We propose a programmatic list of potential applications of asset and investor em-
beddings to finance and economics more generally.