The movement to share data and research openly and freely for the sake of scientific advancement has been around for a while, with strong advocates and adherents that we count as some of our favorite people. Nonetheless, a huge driver in the rise in concern for data sharing and rigorous data management has come from the need to comply with data sharing requirements not only from the journals where the researcher will publish his or her paper, but from grant giving organizations themselves. We wondered what drove those increased requirements, and while we often talk to scientific researchers about their needs and frustrations, we were curious about the perspective of grant giving organizations themselves. So we got in touch with two of the biggest funders in the life sciences in the U.S., the National Institutes of Health (NIH) and the National Science Foundation (NSF) to learn more about what they actually required and why they required it.
When did the requirements for data management plans and data sharing start?
Public requirements, policies, and practices for data management plans and data sharing have been around since the early 2000’s. In 2003 the NIH issued its Data Sharing Policy that required all applicants with annual costs of $500,000 or more to submit a data sharing plan in their applications, and the NSF put a requirement for a data management plan in place in January 2011. In 2014, the NIH established a separate policy for sharing genomic research, called the Genomic Data Sharing Policy, which built on a previous policy called the Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies.
An Office of Science and Technology Policy memo issued in February of 2015 catalyzed the push for hard and fast policies for data sharing and requirements for data management plans. The memo, which largely addresses requirements for federal agencies conducting research, also includes a very important section that affects many of the country’s life science researchers:
Ensure that all extramural researchers receiving Federal grants and contracts for scientific research and intramural researchers develop data management plans, as appropriate, describing how they will provide for long-term preservation of, and access to, scientific data in digital formats resulting from federally funded research, or explaining why long-term preservation and access cannot be justified;
Both the NSF and NIH quickly followed with policies of their own that closely aligned with this memo. In February 2015, the same month as the memo was released, the NIH issued the “National Institutes of Health Plan for Increasing Access to Scientific Publications and Digital Scientific Data from NIH Funded Scientific Research”, also called the NIH Public Access Plan. The NSF’s public access requirements, outlined in “Today’s Data, Tomorrow’s Discoveries: Increasing Access to the Results of Research Funded by the National Science Foundation” went into effect on January 25, 2016.
Both the NIH and NSF have made efforts beyond the requirements set forth in these documents to support an ecosystem of scientific work built around sharing data. According to a statement from an NIH spokesperson to Ovation:
“There are a number of other concerted efforts underway that will enable us to take data sharing policy efforts to a higher level – these include the establishment of an Office of Data Science within the Office of the Director to facilitate a trans-NIH focus towards data preservation, access, analysis, and discovery. Dr. Phil Bourne was named NIH Associate Director for Data Science in 2013. In this role, Dr. Bourne coordinates strategic planning and develops approaches to enable the utilization and extraction of knowledge from the data generated by, and relevant to, NIH research. NIH also established a NIH Scientific Data Council in 2013, a high-level internal NIH group, to address the growing challenges and opportunities associated with ‘big data’ and data science in biomedical research.”
For their part, the NSF holds regular workshops directed toward the different disciplines of science that their grants support, in order to help those researchers craft data management plans that are “suited to the values and needs of the respective research disciplines”.
What do the NIH and NSF actually require, and where do those requirements come from?
The process by which the NSF developed its data management requirements and rubric came from many years worth of pre-existing data management practices, and seems especially sensitive of the diversity of scientific practice and process across the disciplines that it funds. "NSF's public access plan reflects input from the scientific community, recognizes the diversity among the scientific fields that the agency supports, and seeks to minimize burden on both awardees and staff," Chairman of NSF's National Science Board Dan Arvizu said.
Given this commitment, all data management plans required by the NSF are individualized, rather than following a one-size-fits-all rubric, and are “individually evaluated by a panel of disciplinary experts.” The NIH also evaluates plans individually by a “panel of scientific peers.” The NIH’s statement elaborated on this process by saying:
“Reviewers are instructed to assess the reasonableness of the data sharing plan or the rationale for not sharing research data, as there may be certain data that cannot be shared, e.g., due to privacy concerns, human subject protections, informed consents, state laws) If the grant is funded, the data sharing plan becomes a term and condition of the grant award stated in the Notice of Award sent to the grantee. Once the plan becomes a part of the terms and conditions, the grantee is then required to follow through on the actions specified in the plan. NIH program staff are responsible for assessing the appropriateness and adequacy of the proposed data-sharing plan.”
The NSF, meanwhile, has clear requirements for publications resulting from research that received NSF funding as well as the data that informed those publications. Award recipients are strongly encouraged to publish their findings, and publications resulting from awards must be deposited in a repository through which the publication is available, free of charge, no later than 12 months after its initial publication. It has established its own public access repository in partnership with the Department of Energy, Office of Scientific and Technical Information for this purpose. Additionally, and very importantly, the NSF requests that researchers also share “the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants, ” with some exceptions to protect confidentiality or other confidential information, which the researcher must identify in cooperation with the NSF.
Why enact data management and data sharing policies?
When the NSF announced its public access plan, NSF Director France Córdova said: "Scientific progress depends on the responsible communication of research findings. NSF's public access plan is another effort we have undertaken to emphasize the agency's central mission to promote the progress of science." The NIH specifically notes their desire to create “synergies that advance research.” In fact, both the NIH and NSF made clear their belief that reducing barriers to communication within the scientific world was a primary goal in their effort to promote scientific research. The OSTP memo in February, 2015 was an important stake in the ground for both the NSF and NIH, furthering the efforts that both organizations had made beginning years before. Interestingly, both organizations have emphasized data management as a critical part of successful data sharing. The NIH statement notes that this comes from the need for shared data to be “findable, accessible, interoperable and reproducible,” adding that “data management plans must also be certain to protect privacy, proprietary interests, and preserve the balance between the benefits of access/preservation and the costs.” The NIH also notes its hope that, as an ancillary benefit, these policies “will create innovative economic markets for services related to curation, preservation, analysis, and visualization.“
So...how’s it actually going?
In a recent post in Medical Xpress, the authors highlight a study published in Academic Medicine which shows that data sharing has increased significantly since the early 2000’s (although largely done through third parties, rather than via peer-to-peer collaborations). Nonetheless, we wrote a recent blog post about a study done by Amanda Whitmire and Steven Van Tuyl, the findings of which they published in the excellent paper entitled “Water, Water, Everywhere: Defining and Assessing Data Sharing in Academia.” For their study, they developed their own metric, the DATA score, to assess just how shareable a set of NSF funded research studies really were, and in the process assessed those studies’ level of compliance with NSF requirements. The results did not suggest, unfortunately, that the requirements put in place by the NSF (and one can safely assume the NIH as well) have had the impact on data sharing that the NSF has intended.
We at Ovation extrapolate a few things from this, first and foremost that the requirements set forth by the NIH and NSF have just begun to affect the practice and process of research, and that data sharing will increase and data management will improve in the years to come. However, this won’t happen quickly or easily unless researchers are empowered with the proper tools and infrastructure that make data management easy and intuitive, and data sharing sufficiently beneficial to those researchers and scientists who worked tirelessly on data collection and analysis. But it goes deeper: a recent Washington Post article dove into the controversy around “research parasites,” and cited a study by the Journal of the American Medical Association, which found that researchers had only requested 15.5% of the clinical trial data made available in the three online portals studied. This suggests a couple of things to us: that searchability alone can’t solve the problems of data sharing and crossing the gap between data silos, and more importantly that while placing data into repositories might meet compliance standards, that data is often not as usable as researchers would like because it lacks the context, or as we call it provenance, of the experimental and analytical pipeline from which it originated (by the way, Ovation provides a pretty great provenance graph of project data).
As noted by the NIH, their hope is that “innovative economies” will arise to help scientists meet the challenge of managing and sharing data. We are proud that Ovation is among those organizations and companies that seek to assist scientists by focusing squarely on their challenges and providing practical, intuitive tools to help them focus on what they do best: pursuing world changing scientific discoveries. We’d love to have a conversation with you about how data management and sharing requirements have affected you. Sign up for a demo and we’ll give you call, email us at firstname.lastname@example.org, or tweet at us at @ovation_io. We look forward to learning from you, and working together on solutions to enable great science.