« Back to Results

Using Text Data to Understand the Labor Market

Paper Session

Friday, Jan. 7, 2022 10:00 AM - 12:00 PM (EST)

Hosted By: American Economic Association
  • Chair: Erica Groshen, Cornell University

The Geography of Job Tasks

Enghin Atalay
,
Federal Reserve Bank of Philadelphia
Sebastian Sotelo
,
University of Michigan-Ann Arbor
Daniel Tannenbaum
,
University of Nebraska-Lincoln

Abstract

We present new facts about the geography of work using online job ads and introduce new measures of job tasks, technology requirements, and the degree of specialization within firms or occupations. We show that (i) the intensity of interactive and analytic tasks, (ii) technological requirements, and (iii) task specialization all increase with city size. The gradient for tasks and technologies is steeper for jobs requiring a college degree. We show that these facts help account for the urban wage premium, both in aggregate and across skill groups.

job2vec: Using Language Models to Understand Wage Premia

Sarah Bana
,
Stanford Digital Economy Lab
Erik Brynjolfsson
,
Stanford Digital Economy Lab
Daniel Rock
,
University of Pennsylvania
Sebastian Steffen
,
Massachusetts Institute of Technology

Abstract

There is a rich tradition in the economics literature of estimating wage premia for various occupational characteristics by applying hedonic regression (Mincer 1974; Heckman, Lochner, and Todd 2003). Hedonic regression techniques uncover the predictive value of occupational features for equilibrium outcomes in the labor market. We extend this tradition to neural language models, a newly applicable set of machine learning algorithms that can quantify text data from job postings. We use language models to uncover relationships between specific job posting features and wages. While many earlier efforts were limited to occupations as the primary unit of observation, the rich structure of neural language models like BERT (Devlin et al (2018] permits posting-level analysis.
Using a new dataset from Greenwich.HR with salary information linked to posting data from Burning Glass Technologies, we apply a set of natural language processing (NLP) techniques to build a model that predicts wages from job posting text with very high accuracy. We extend the model with state-of-the-art machine learning techniques to interpret our predictions, assigning valuations to skills and other entities in the spirit of earlier hedonic regression techniques. We develop a method of injecting text with interpretable concepts into job postings to understand how wage predictions change in response to posting text differences, an approach we call “text injection” (Bana et al. 2021). We run text injection experiments to predict wage premia arising from variation in skills, regions, credentials, and firms. For further interpretability, we apply a technique called integrated gradients from the deep learning literature (Sundararajan, Taly, and Yan 2017) that provides alternative wage premia attribution estimates to specific words. Our results suggest that wage heterogeneity is broadly determined by the interaction of many factors. Our neural language approach offers one way forward to understand the complexity of these structures.

Lessons from Nine Decades of Changes in Occupational Tasks

Andre Assumpcao
,
Harvard University
Dario Diodato
,
Joint Research Centre of the European Commission
Ljubica Nedelkoska
,
Harvard University
Shreyas Gadgin Matha
,
Harvard University
James McNerney
,
Harvard University
Frank Neffke
,
Harvard University

Abstract

Economic scholars have successfully documented the changes in the content of jobs and occupations since the advent of computers. However, with the notable exception of Atalay et al. (2020), few studies analyze the period leading up to the computer era. Moreover, while the focus on technological change proved useful to understand how certain aspects of our jobs changed (e.g., towards non-routine work), it may hinder a full appreciation of the societal job content changes that took place.
Using optical character recognition and natural language processing, we transformed the U.S. Dictionary of Occupational Titles (DOT, 1939 - 1991) into a database akin to, and comparable with its digital successor, O*NET (1998 - today). After creating a single occupational classification, we connected all DOT waves, and decennial O*NET databases into a single dataset stretching over nine decades and merged to this information from the U.S. Decennial Census on employment, wages, and other labor market characteristics.
Using this database, we show that our task predictions corroborate the observed changes in the job task categories documented in previous work, but we also show that all types of manual work were in decline long before the advent of computers, and this decline is better attributed to mechanization and automation that predate the computer. We further show that in the pre-computer era, women worked in occupations dominated by relatively low-productivity technologies (e.g., stenotype, mimeograph, typewriter, dictation machine) that were replaced by a high-productivity technology (computers), explaining further why women adopted computers faster than men and why their work de-routinized faster, a shift that reduced the gender wage gap (Black and Spitz-Oener, 2010).

Learning Biased Technical Change

Erik Brynjolfsson
,
Stanford Digital Economy Lab
Lindsey Raymond
,
Massachusetts Institute of Technology

Abstract

The potential for technological change to raise demand for higher skill workers, or those who are more educated, able or experienced, occupies a central role in understanding increases in wage inequality and employment trends over the past fifty years. By increasing the potential set of tasks that can be done by capital instead of labor, there has been concern and some early evidence that machine learning and artificial intelligence (AI) might lead to greater automation and wage polarization. While much of the existing empirical and theoretical work focuses on potential for AI to automate tasks formerly performed by labor, we document that AI, because it is very good at learning optimal ways to perform tasks, has the potential to affect labor productivity by efficiently coaching and training workers. In this paper, we use a series of randomized controlled trials and natural language processing of sales and support conversations to show that AI can disproportionately benefit less experienced and less able customer service representatives. Our results highlight that improvements in technology’s ability to learn imply that continued improvements in technology need not be skill-biased.
Discussant(s)
Adrien Bilal
,
Harvard University
Stephen Hansen
,
Imperial College London
Anna Salomons
,
Utrecht University
Avi Goldfarb
,
University of Toronto
JEL Classifications
  • J2 - Demand and Supply of Labor
  • J6 - Mobility, Unemployment, Vacancies, and Immigrant Workers