Research Objects are a wonderful thing! Uh, but just what is a Research Object? One definition, according to Workflow4Ever is “Research Objects (ROs) are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations.” Uh2, doesn’t a ‘semantically rich aggregation of resources that bring together data, methods and people in scientific investigations‘ sound just like a weird definition of a scientific paper? Well, that’s what I thought. One of the core missions of ReproNim is to support the generation of completely re-executable research publications, aka the ReproPub (Kennedy 2019). A ReproPub can be used to verify a study’s claims and explore their generalizability through systematic variation of the input data and analytic components. Such a publication includes the complete description (i.e. provenance) of the: experimental data, analysis workflow (script, pipeline), execution environment (OS version, hardware specification), and results that are used to establish that publications’s claim(s). Yet these elements: the data; workflow; environment and results; are themselves Research Objects; scholarly products that each may have its own history, evolution, creators, credit, and reusability. This makes the ReproPub itself an overarching mechanism (Research Object) to aggregate these subsidiary research objects together in support of a specific set of claims.
Example
In the ReproPub by Ghosh, et al. (2017), care is taken to explicitly annotate the constituent objects:
- Data at: doi, 10.18116/C6C592
- Workflow and Results at: https://github.com/ReproNim/simple_workflow and archived at: doi, 10.5281/zenodo.800758
- Environment at: https://hub.docker.com/r/repronim/simple_workflow/tags/1.1.0
Additional objects, of course, can be included, such as hypothesis pre-registration, separation of image analysis workflow and statistical analysis workflow, etc.
Summary
Creating each of these complex Research Objects is still hard and needs to be made easier. Some tools ReproNim is working on in this area include:
- NeuroDocker/ReproEnv – Simplifying creating containers
- DataLad – Version control and publication of data and containers
- ReproMan (& DataLad) – Managing running of containers locally or on distributed resources
- TestKraken – Continuous integration and stability assessment
Facilitating the evolution of publications to fully provisioned Research Objects has numerous benefits. A ReproPub embraces many of the recent advances and evolutions in publication: treatment of data as a first-class research object (Honor, et al. 2016); the principles of software citation (Katz et al. 2019); the FAIR (findable, accessible, interoperable and reusable) (Wilkinson et al. 2016) principles applied to the scientific process itself. The resulting scientific literature can be rendered not only in a more reproducibility-supportive fashion but will allow for a much fuller and precise description by adding to the publication the elements for which claims generalize, or not. Culturally, this evolution of publication practice should be perceived as a plus for the scientific community. Specifically, what used to be one publication that referred to data, processing, and a set of claims, can now conceivably become numerous publications of distinct and independently creditable scientific output: a publication for the data, a publication for the processing approach, a publication for the complete results, in addition to the publication for the conclusions and claims.
Acknowledgements
Elements of this work are based upon a presentation of a conference paper at the Workshop on Research Objects 2019 (RO2019) at IEEE eScience 2019 — http://www.researchobject.org/ro2019/ .