Apr 29 -- The National Center for Education Research (NCER), a center within the U.S. Department of Education's Institute of Education Sciences, funds and coordinates high-quality, innovative research that addresses the biggest challenges facing education in the 21st century. Through this request for information (RFI), NCER seeks public input to help us identify existing large datasets that may be useful for research and to understand the challenges and limitations that may affect access and their value for research. We must receive your comments by May 31, 2022.
The number of large education-related datasets is growing, and we have new opportunities to leverage these data to address critical questions of policy and practice. For example, State longitudinal data systems (SLDS) can support research on the questions that State agencies have about a specific education issue, program, or policy. SLDSs have the potential to support lower-cost, faster research by avoiding the need for costly primary data collection. Similarly, education technologies generate large amounts of data that—after ensuring students' privacy is protected—can potentially provide valuable insights about learning. Despite the large amount of raw data collected by these technologies, there are legal, practical, and methodological barriers to conducting research that leverages these types of datasets to understand and improve students' education outcomes. Education researchers seeking to conduct studies using these datasets confront challenges related to the validity of data elements and the logistics of data access in ways that protect students' privacy, consistent with local, State, and Federal law. Researchers face significant barriers and costs to access these datasets, which leads to only a small number of education studies with large sample sizes, despite the known advantages of these types of studies.
There are examples of the potential insights to be gained from these data, and the fields of educational data mining and learning analytics have developed methods and insights for working with large datasets. For example, researchers have analyzed data collected in the digital administration of NAEP, which has led to insights into multiple aspects of student test-taking strategies.
Data privacy is central to the ethical conduct of research. Any plans to leverage the large amounts of data that are being collected through education technology, State longitudinal data systems, and other sources must be designed to minimize the risk of disclosure in order to protect the privacy of students.
Through this RFI, we seek public comment to help us identify existing large datasets, especially those that are generated using education technology, that may be useful for research; identify best practices for creating new, large datasets that are valuable for research; understand the challenges and limitations that may impact data access; and develop and implement plans to protect students' privacy.
This is a request for information only. This RFI is not a request for proposals (RFP) or a promise to issue an RFP or a notice inviting applications. This RFI does not commit the Department to contract for any supply or service whatsoever. Further, we are not seeking proposals and will not accept unsolicited proposals. The Department will not pay for any information or administrative costs that you may incur in responding to this RFI. The documents and information submitted in response to this RFI will not be returned.
We will review every comment, and the comments in response to this RFI will be publicly available on the Federal eRulemaking Portal at www.regulations.gov.
We invite stakeholders who are aware of large datasets relevant to education and learning, especially those generated through education technology; stakeholders who have perspectives on the value of these datasets for education research; and stakeholders who are aware of challenges and limitations to both access and use of large datasets to share responses to the following questions in their comments:
(1) What public or restricted use education-related datasets are available for training students in data mining/machine learning methods? What training needs are not being met by the datasets that are currently available?
(2) What open or restricted use education-related datasets are available to train new artificial intelligence models or to test hypotheses using data mining/machine learning methods? What research needs are not being met by the datasets that are currently available?
(3) What work do researchers need to do to access, and then explore the quality of, an existing dataset before conducting research with it? What aspects of this work could be reduced or conducted just once so that future researchers can reduce the time needed to complete a research project?
(4) How do researchers determine the validity of data elements within previously collected datasets? What challenges are frequently encountered related to how those data align to constructs of interest?
(5) What are promising approaches to testing and improving the validity of metrics within large datasets, especially those datasets that are developed through interactions with education technology?
(6) How likely is it that existing datasets, especially those that come out of education technology, contain data that are valuable for researchers and of sufficient quality that research could be conducted with a high amount of rigor?
(7) To what extent do existing datasets capture enough information to address research questions related to diversity, equity, inclusion, and accessibility? What additional data should be collected to address these questions?
(8) What are the best practices for creating new datasets or linking existing datasets and sharing them with researchers (open or restricted use) while prioritizing the privacy of individuals and adhering to local, State, and Federal laws? What barriers and limitations exist?
(9) What role can IES play in developing infrastructure that supports the use of large-scale datasets for education research?