Open Science and Research Data Management: a workshop review
As Open Data and the F.A.I.R. Principles are more and more becoming a standard in scientific processes, transparency and reproducibility of the whole research workflow are starting to be recognized as good scientific practices (see for instance the SPARC initiative openscholarchampions.eu). To make the transition towards Open Data and the implementation of FAIR Principles smooth and approachable, we designed the following 2-day course on Open Science and Research Data Management. In this course, we provide guidelines for researchers to design their research data management in a way that also saves time in the process. In our view, data management is a key skill to success.
Video 1: RDM saves time and efforts.
The first Data Management workshop was held in Bochum in November 2018: After a warm-up including introductory discussions and an introduction to the Open Science Principles and values, we worked on the definition of Open Data and its relevance in the current research ecosystem. Emphasizing how collaboration is at the center of Open Science, I introduced Gitlab – an open source alternative to GitHub, which in turn is a platform used by many Open Science enthusiasts, including the Mozilla Researchers community, to manage open projects-. Participants were asked to populate a Gitlab-based website (kept private on the university server) with their personal information. We then looked at research data management principles, working on participant projects in small groups, defining the raw and primary data, the data flow and the main objective of their studies. The participants had one minute to explain their experiments to the plenum in an elevator pitch format. In the afternoon of day 1 we addressed questions around digital data organisation, file naming and proper spreadsheet structure in order to create machine readable outputs. Participants were asked to write a data management plan into the Gitlab with peer feedback.
The second day started with an introduction to collaborative working and why Open Science is not so much about which tools or platforms to use, but rather about connecting with other people in ways that allow researchers collaborate more efficiently. We then discussed good and bad scientific practices, relating these to data handling challenges. I introduced Open Access and Creative Commons licensing before we moved on to the reproducibility crisis and how Open Science practices can solve many of its related issues. We shed some light on Open Data, Open Materials, Open Methodology and Open Source Software for data analysis, and why all of these components of Open Science are necessary but not sufficient steps toward reproducible results. I also presented and discussed digital tools and services such as open data repositories (zenodo, Figshare, re3data.org,…), Open Materials and Methods (protocols.io, RRID), and analysis via programming languages (R, python, and matlab). Throughout the second afternoon, we were working with Rstudio in connection with Gitlab building a reproducible analysis from a sample dataset provided to the participants (in R or Python).
While we want to keep the nice atmosphere, preserving time for questions and interactions and keep a good balance between lectures and practical exercises, we are working on the next version of the workshop adding:
- A discussion on the usefulness of online presence and the use of digital services such as ORCID
- A walkthrough on how to link Rstudio and Gitlab/Github, sent prior to the workshop
- Specific exercises to ameliorate the practical part and increase participant engagement
We conclude that researchers should best be familiarized with F.A.I.R. research data management practices early on in their career.
In order to best deliver the skills and tools necessary for researchers to easily and efficiently embrace Open Science and FAIR Data Management, we are continuously adapting and updating the course content and materials. Our teaching material is available on Github.
Two of the slides showed in the workshop: “Why open science” at the beginning of the workshop, “Version control and git” on the second day.
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018 doi: 10.1038/sdata.2016.18 (2016).
Find additional resources on the Open Science MOOC website.