The post that follows is by our guest author Alice Motes, who is a Research Data and Preservation Manager at the University of Surrey, UK.
What’s Open science?
Great question! Open science refers to a wide range of approaches to doing science including open access to publications, open data, open software/code, open peer review, and citizen science (among others). It is driven by principles of transparency, accessibility, and collaboration resulting in a very different model of production and dissemination of science than is currently practiced in most fields. There tends to be a gap between what scientists believe and how they actually behave. For example, most scientists agree that sharing data is important to the progress of science. However, fewer of those same scientists report sharing their data or having easily accessible data (Tenopis et al. 2011).
In many ways, open science is about engaging in full faith with the ideals of the scientific process, which prizes transparency, verification, reproducibility, and building on each other’s work to push the field forward. Open science encourages opening up all parts of the scientific process, but I want to focus on data. (Conveniently, the area I’m most familiar with! Funny that.) Open data is a natural extension of open access to academic publications.
Most scholars have probably benefited from open access to academic journals. (Or hit a paywall to a non-open access journal article. Did you know publishers have profit margins higher than Apple?) The strongest argument behind open access is that restrictions on who can get these articles slows scientific advancement and breakthroughs by disadvantaging scientists without access. Combine that with the fact that most research is partially or wholly funded by public money and it’s not a stretch to suggest that these outputs should be made available to the benefit of everyone, scientist and citizens.
Open data is extending this idea into the realm of the data, suggesting that sharing data for verification and reuse can catch errors earlier, foster innovative uses of data, and push science forward faster and more transparently to the benefit of the field. Not to mention the knock on benefits of those advances to the public and broader society. Some versions of open data advocate for broaden access beyond scientific communities into the public sphere, where data may be examined and reused in potentially entrepreneurial ways to the benefit of society and the economy. You may also see the term open data applied in relation to government agencies at all levels releasing data that they hold as part of a push for transparency in governance and potential reuse by entrepreneurs, like using Transport for London’s API to build travel apps.
What are the potential benefits to open data?
You mean beyond the benefits to your scholarly peers and broader society? Well there are lots of ways sharing data can be advantageous for you:
- More citations – there’s evidence to suggest that papers with accompanying data get cited more. (Piwowar and Vision 2013).
- More exposure and impact – more people will see your work, which could lead to more collaborations and publications.
- Innovative reuse – your data may be useful to in ways you don’t anticipate outside your field, leading to interdisciplinary impact and more data citations.
- Better reproducibility: The first reuser of your data is actually you! Plus, avoid a crisis. (Need more reasons? Check out selfish reproducibility).
Moreover, you’ll benefit from access to your peers shared data as well! Think about all the cool stuff you could do.
Great! I’m on board. How do I do it?
Well you just need to answer these three questions, really:
1. Can people get your data?
How are people going to find and download your files? Are you going to deposit the data into a repository?
2. Can people understand the data?
Ok so now they’ve got your data. Have you included enough documentation that they can understand your file organization, code, and supporting documents?
3. Can people use the data?
People have got a copy of your data and they know how to use it. Grand! But can they actually use it? Would someone have to buy expensive software to use it? Could you make a version of your data available in an open format? Have you supplied the code necessary to use the data? (Check out the Software Sustainability Institute for tips.)
For more check out the FAIR principles (Findable, Accessible, Interoperable and Reusable.)
Of course, there are some very good ethical, legal, and commercial reasons why sharing data is not possible, but I think the goal should be to strive towards the European Commission’s ideal of “as open as possible, as closed as necessary”. You can imagine different levels of sharing, expanding outward: within your lab, within your department, within your university, within your scholarly community, and publicly. For most funders across North America and Europe, they see data as a public good with the greatest benefit coming from sharing data to the widest possible audience and encourage publicly sharing data from their funded projects.
Make an action plan or a data management plan
Here are some things to do help you get the ball rolling on sharing data:
- Get started early and stay organized: document your research anticipating a future user. Check out Center for Open Science’s tools and tips.
- Deposit your data into a repository (e.g. Zenodo, Figshare.) Many universities have their own repository. Some repositories integrate with github, dropbox, etc. to make it even easier!
- Get your data a DOI so citations can be tracked (Repositories or your university library can do this for you.)
- Consider applying a license to your data. Don’t be too restrictive though! You want people to do cool things with your data.
- Ask for help: Your university likely has someone who can help with local resources. Probably in the library. Look for “Research Data Management”. You might find someone like me!
But I don’t have time to do it!
Aren’t you already creating documentation for yourself? You know, in case someone questions your findings after publication or if reviewer 2 (always reviewer 2 :::shakes fist:::) questions your methods or in a couple months when trying to figure out why you decided to run one analysis over another. Surely, making it intelligible to other people isn’t adding much to your workflow…or your graduate assistant’s workflow? If you incorporate these habits early in the process you’ll cut down the time necessary to prepare data at the end. Also, if you consider how much time you spend planning, collecting, analyzing, writing, and revising, the amount of time it takes to prepare your data and share it is relatively small in the grand scheme of things. And why wouldn’t you want to have another output to share? Matthew Partridge a researcher from University of Southampton and cartoonist at Errant Science has a great comic illustrating this:
In sum, open science and open data is a model for a more transparent and collaborative type of scientific inquiry. One that lives up to the best ideals of science as a community effort all moving towards discovery and innovation. Plus you get a cool new output to list on your CV and track its impact in the world. Not a bad shake if you ask me.