The National Science Foundation in committing $38 million to establish a new platform to wrangle the ever-increasing amount of social and behavioral science data. Collateral goals include broadening and diversifying participation in the social and behavioral sciences and making its findings more accessible outside of academe.
The NSF – the United States’ premiere funder of social and STEM basic research – announced the Research Data Ecosystem: A National Resource for Reproducible, Robust, and Transparent Social Science Research in the 21st Century on February 4. The project is supported by the larger NSF-wide Mid-scale Research Infrastructure program.
The University of Michigan Institute for Social Research will oversee the creation of new data archives and software researchers can use to access, organize, analyze and contribute data. The project’s estimated completion is expected in January 2027.
An abstract for the grant notes how incompatible standards for data, lack of interoperability, and the inherent difficulty of managing big data all add friction to research, creating “an urgent need for new modes of access, confidentiality protection, methodological approaches, and tools.” By “modernizing” data management in the social and behavioral sciences, the Research Data Ecosystem is expected to create opportunities for researchers that are not possible with existing infrastructure.
“To truly leverage the societal benefits of science, researchers across the U.S. must be able to access and analyze critical data with greater efficiency while simultaneously maintaining rigorous standards for privacy and scientific integrity,” a release quotes Kellina Craig-Henderson, the NSF’s acting assistant director for social, behavioral and economic sciences. “The Research Data Ecosystem project will modernize the management and use of many types of people-centered data, thus accelerating multidisciplinary research focused on serving society and improving the lives of people all over the country.”
Specifically, the abstract explains that ecosystem will develop software that enables:
1) Interoperability: An integrated system for the entire research data lifecycle, so that work done early in the data lifecycle is useful at later stages, making it possible to integrate data from different sources;
2) Reproducibility: Making it easier to reproduce and build on prior research results by being able to find and re-use data and code;
3) Transparency: Providing information about provenance, including source, code, method of collection, etc. for research data;
4) Increased efficiency of data sharing: Reducing burden on data producers in sharing data and ensuring that shared data are findable, accessible, interoperable, and reusable; and
5) Confidentiality protection while increasing research access.
To achieve these goals, the project will develop a metadata specification, the Research Data Description Framework, similar to the Resource Description Framework.
The principal investigator for the ecosystem is economist Margaret Levenstein, who directs the Inter-university Consortium for Political and Social Research based at the University of Michigan. Co-principal investigators are psychologist Richard Gonzalez, economist David Lam, political scientist Kenneth Kollman and sociologist Jeffrey Morenoff.