Who me? Share my data with strangers? Aren’t they my competitors? Would they use my data to criticize me? Would they take the credit (through publication) for my hard work? Would they understand my data well enough to arrive at valid results and conclusions? How can I protect the privacy of my subjects and the confidentiality of my data if I share my data with the world? I recognize the importance of data sharing in some fields but I don’t think it is relevant to my research.
Comments such as these have been heard throughout the last 50 years whenever data sharing is discussed. The uninitiated tend to regard data sharing as a great new idea — and then typically go on to mention reasons why it would not work in their discipline. In fact, large-scale international data sharing has existed for almost 150 years, and has evolved to include sharing of most kinds of scientific data.
Entire disciplines are now based largely on shared data. For example, the field of macroeconomics is based largely on government data that are shared with economists. Society and government benefit from the knowledge generated by (mostly academic) economists, who, in turn, build their career on their work with data they, themselves, could not feasibly gather on their own.
A very different discipline, meteorology, has developed since before 1900, based on data shared internationally. An international standard for weather observation was adopted by the Vienna Congress of 1973 and countries worldwide have provided daily meteorological data at a steadily increasing rate ever since. For example, 1,632 Indian stations provided daily data in 1901; 2,536 stations in 1970 (Jenne, p.6, in Sieber, Sharing Social Science Data, 1989, SAGE). There are many conceivable reasons why one might think sharing meteorological data would not work: for example, national security interests, inaccuracy of data gathering and transmission by some participating countries, and challenges of storing and analyzing so much data. However, all of these challenges have been met. Imperfect weather data are better than no data at all, security risks are minor compared to the advantages of having these data, and modern computers have diminished the challenges of storing and analyzing this huge and rapidly growing data base. These data are used in all of the other geophysical sciences, (solar terrestrial physics, glaciology, and solid earth geophysics, to name just a few) in biology, agriculture, demography, and economics, and in response to current world concerns such as global warming and drought.
Each discipline raises its own set of challenges, risks and benefits of data sharing, and with ingenuity the risks can be overcome. For example, the very expensive sciences such as astronomy, oceanology and space exploration have advanced rapidly because scientists share equipment (e.g., telescopes), and samples (e.g., polar ice cores, Martian samples). The challenges of administering such sharing are vastly different from the challenges of sharing meteorological or economic data. Sharing social and biomedical data, which raise overwhelming concerns about confidentiality and informed consent, are again entirely different, and the practice of sharing such data are by now quite well developed.
How has it been possible to advance the practice of data sharing given these complexities? What hope is there that data sharing will now become a worldwide practice within and among most of the countries of the world? As editor of the Journal of Empirical Research on Human Research Ethics, I had the pleasure of working with University of Oxford researchers Susan Bull and Michael Parker, who served as special editors of five studies they conducted with partners in Kenya, South Africa, India, Thailand and Vietnam, respectively, to investigate the views of researchers and gatekeepers of research concerning the initiation of biomedical data sharing in their country. Their responses, while positive about the general concept, were more skeptical about sharing of their own data, not unlike the responses of persons in Western countries who have not shared data. Additionally, researchers from non-Western countries have concerns about their status in the world.
The thing that I would predict might be the biggest obstacle is xenophobia. You know if you got a big dataset build up by someone like an American funder, it’s likely that it is going to be very easy to access as an American researcher and really difficult to access by anybody who’s not an American. (Senior Thai researcher, p.286)
For many, data sharing was not a high priority. Many researchers regarded their role as focusing on scientific and technical aspects of a project in development, while minimizing commitments such as documentation of their data.
…no one can imagine the way ahead because they only pay attention to the technical aspects [of research]. Data sharing is not a priority [in Vietnam] at this point (Ethics Committee member, p 254.)
Still, even within Vietnam, there were seen to be economic advantages to data sharing:
In foreign countries, students are granted money to do research on their own, but it is not the same in Vietnam. We should simplify the procedures to help students practice on real data sets. (Government officer, p 255.)
And, there were many versions of “Who Me? Share my data? For example:
There are many, many reasons I might not want to embark on this ship of data sharing. One of them might be that there’s work involved; it’s going to take time. I’ve got to clean the data sets and I’ve got to respond to their applications for it. I’ve got to put in a data sharing policy. I’ve got to convene that little committee that I’m talking about. … All of this is more work. If I don’t have to do this, why should I start? (Senior Researcher. P. 244.)
In every movement to institutionalize norms of data sharing, there has been the cry of “Who me? Share my data?” But the reasons to share data are overwhelmingly in favor of the practice. Data are the foundation of empirical research in all of the sciences. To understand and build on the work of others, researchers often need access to the data on which the work is based. Data sharing reinforces the norm of openness in scientific inquiry. It fosters verification, refutation or refinement of existing findings. It promotes new research, new ideas and alternative perspectives on any given scientific problem. It encourages more appropriate ue of empirical data in policy making and evaluation. It fosters improvement of measurement and data collection methodology. And, it provides a powerful pedagogical tool for teaching students research design, analysis and interpretation of findings. (These advantages are discussed extensively in the U.S. National Academy of Sciences book Sharing Research Data by Fienberg, Martin and Straf, 1985.)
Not so fast! You say. This is all glib talk. There are many obstacles to data sharing. There is the cost of documenting data. Funders are willing to pay for documentation of important data archives, but not documentation of “data graveyards” – archives no one would want to use. Data must be rendered anonymous without destroying their usefulness to secondary users. Some important data are proprietary. Some data, if shared, could lead to harmful uses or pose national security risks.
True. This is precisely why data sharing practices advance slowly. However, as we see in the July 2015 issue of JERHRE, not only have these issues been handled by well-developed methods and policies. Those policies and practices are now being shared collaboratively between institutions in developed countries and their low- to middle-income counterparts. George Alter and Mary Vardigan of the Institute for Social Research, Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, offer advice on issues of informed consent, data management, data dissemination, and validation of research contributions. Indeed, ICPSR has worked with research administrators in various developing countries and enabled them to solve these problems in their context, and thus to avoid having to “reinvent the wheel” in their own country.
In the July 2015 issue of JERHRE we see the emergence of exciting next steps in human data sharing. Gatekeepers of biomedical research in India, Kenya, South Africa, Thailand and Vietnam examine the possibility of sharing their biomedical data. We also learn from data sharing experts throughout the world how the problems raised by these gatekeepers can be (and have been) solved effectively within developing countries.
Various diseases that pose a serious threat to the world have mutated from animal species to humans in Africa and Southeast Asia. With the globalization of biomedical research and concerns about possible pandemics of such diseases as HIV, SARS and Ebola, the institution of data sharing in developing African and Asian countries is timely. Data sharing enables researchers world-wise to build on the efforts of others in a cost-effective way. Base-line data will be in place when epidemics strike. The political, scientific, and economic problems of understanding and stopping new diseases will be vastly reduced when an infrastructure and baseline data are readily available to scientists. We can look forward to important advances in biomedical science in connection with this new development.
… and the scientists who participate in sharing worthwhile data will experience the career advances that come when playing in the scientific big leagues.