Using Text Data to Understand the Labor Market
Friday, Jan. 7, 2022 10:00 AM - 12:00 PM (EST)
- Chair: Erica Groshen, Cornell University
job2vec: Using Language Models to Understand Wage Premia
AbstractThere is a rich tradition in the economics literature of estimating wage premia for various occupational characteristics by applying hedonic regression (Mincer 1974; Heckman, Lochner, and Todd 2003). Hedonic regression techniques uncover the predictive value of occupational features for equilibrium outcomes in the labor market. We extend this tradition to neural language models, a newly applicable set of machine learning algorithms that can quantify text data from job postings. We use language models to uncover relationships between specific job posting features and wages. While many earlier efforts were limited to occupations as the primary unit of observation, the rich structure of neural language models like BERT (Devlin et al (2018] permits posting-level analysis.
Using a new dataset from Greenwich.HR with salary information linked to posting data from Burning Glass Technologies, we apply a set of natural language processing (NLP) techniques to build a model that predicts wages from job posting text with very high accuracy. We extend the model with state-of-the-art machine learning techniques to interpret our predictions, assigning valuations to skills and other entities in the spirit of earlier hedonic regression techniques. We develop a method of injecting text with interpretable concepts into job postings to understand how wage predictions change in response to posting text differences, an approach we call “text injection” (Bana et al. 2021). We run text injection experiments to predict wage premia arising from variation in skills, regions, credentials, and firms. For further interpretability, we apply a technique called integrated gradients from the deep learning literature (Sundararajan, Taly, and Yan 2017) that provides alternative wage premia attribution estimates to specific words. Our results suggest that wage heterogeneity is broadly determined by the interaction of many factors. Our neural language approach offers one way forward to understand the complexity of these structures.
Lessons from Nine Decades of Changes in Occupational Tasks
AbstractEconomic scholars have successfully documented the changes in the content of jobs and occupations since the advent of computers. However, with the notable exception of Atalay et al. (2020), few studies analyze the period leading up to the computer era. Moreover, while the focus on technological change proved useful to understand how certain aspects of our jobs changed (e.g., towards non-routine work), it may hinder a full appreciation of the societal job content changes that took place.
Using optical character recognition and natural language processing, we transformed the U.S. Dictionary of Occupational Titles (DOT, 1939 - 1991) into a database akin to, and comparable with its digital successor, O*NET (1998 - today). After creating a single occupational classification, we connected all DOT waves, and decennial O*NET databases into a single dataset stretching over nine decades and merged to this information from the U.S. Decennial Census on employment, wages, and other labor market characteristics.
Using this database, we show that our task predictions corroborate the observed changes in the job task categories documented in previous work, but we also show that all types of manual work were in decline long before the advent of computers, and this decline is better attributed to mechanization and automation that predate the computer. We further show that in the pre-computer era, women worked in occupations dominated by relatively low-productivity technologies (e.g., stenotype, mimeograph, typewriter, dictation machine) that were replaced by a high-productivity technology (computers), explaining further why women adopted computers faster than men and why their work de-routinized faster, a shift that reduced the gender wage gap (Black and Spitz-Oener, 2010).
Learning Biased Technical Change
AbstractThe potential for technological change to raise demand for higher skill workers, or those who are more educated, able or experienced, occupies a central role in understanding increases in wage inequality and employment trends over the past fifty years. By increasing the potential set of tasks that can be done by capital instead of labor, there has been concern and some early evidence that machine learning and artificial intelligence (AI) might lead to greater automation and wage polarization. While much of the existing empirical and theoretical work focuses on potential for AI to automate tasks formerly performed by labor, we document that AI, because it is very good at learning optimal ways to perform tasks, has the potential to affect labor productivity by efficiently coaching and training workers. In this paper, we use a series of randomized controlled trials and natural language processing of sales and support conversations to show that AI can disproportionately benefit less experienced and less able customer service representatives. Our results highlight that improvements in technology’s ability to learn imply that continued improvements in technology need not be skill-biased.
- J2 - Demand and Supply of Labor
- J6 - Mobility, Unemployment, Vacancies, and Immigrant Workers