Research Objects are us…

Research Objects are a wonderful thing! Uh, but just what is a Research Object?  One definition, according to Workflow4Ever is “Research Objects (ROs) are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations.” Uh2, doesn’t a ‘semantically rich aggregation of resources that bring together data, methods and people in scientific investigations‘ sound just like a weird definition of a scientific paper? Well, that’s what I thought. One of the core missions of ReproNim is to support the generation of completely re-executable research publications, aka the ReproPub (Kennedy 2019). A ReproPub can be used to verify a study’s claims and explore their generalizability through systematic variation of the input data and analytic components. Such a publication includes the complete description (i.e. provenance) of the: experimental data, analysis workflow (script, pipeline), execution environment (OS version, hardware specification), and results that are used to establish that publications’s claim(s). Yet these elements: the data; workflow; environment and results; are themselves Research Objects; scholarly products that each  may have its own history, evolution, creators, credit, and reusability.  This makes the ReproPub itself an overarching mechanism (Research Object) to aggregate these subsidiary research objects together in support of a specific set of claims.ReproPubSlide


In the ReproPub by Ghosh, et al. (2017), care is taken to explicitly annotate the constituent objects:

Additional objects, of course, can be included, such as hypothesis pre-registration, separation of image analysis workflow and statistical analysis workflow, etc.


Creating each of these complex Research Objects is still hard and needs to be made easier. Some tools ReproNim is working on in this area include:

Facilitating the evolution of publications to fully provisioned Research Objects has numerous benefits.  A ReproPub embraces many of the recent advances and evolutions in publication: treatment of data as a first-class research object (Honor, et al. 2016); the principles of software citation (Katz et al.  2019); the FAIR (findable, accessible, interoperable and reusable) (Wilkinson et al. 2016) principles applied to the scientific process itself. The resulting scientific literature can be rendered not only in a more reproducibility-supportive fashion but will allow for a much fuller and precise description by adding to the publication the elements for which claims generalize, or not.  Culturally, this evolution of publication practice should be perceived as a plus for the scientific community.  Specifically, what used to be one publication that referred to data, processing, and a set of claims, can now conceivably become numerous publications of distinct and independently creditable scientific output: a publication for the data, a publication for the processing approach, a publication for the complete results, in addition to the publication for the conclusions and claims.


Elements of this work are based upon a presentation of a conference paper at the Workshop on Research Objects 2019 (RO2019) at IEEE eScience 2019 — .

ReproNim Training: Some Lessons Learnt

While the fields of life sciences, and neuroimaging in particular, are struggling with the apparent reproducibility crisis, the community is honing its skills, developing tools and best practices to foster more replicable and reproducible studies. Tools are key in this respect – without new practical and efficient tooling to design studies, analyze data, and verify results, research will keep moving too slowly (due to potential uncertainty and ambiguity in its results) in its strive to establish a fundamental understanding of the neurobiology of health and disease and hence not succeed in responding appropriately or maximally effectively to the needs of the populations suffering from brain diseases. Tools that provide ‘provenance aware’ (capturing exactly how each step was carried out) support  for all phases of the analysis workflow (pipeline systems for analysis, better handling mechanisms for data and software (i.e. containers), better ways to harmonize and track provenance) are dearly needed.

But for many of us, the fundamental work has to be in the training of the research community. First, tools (software products and libraries) are only as good as their users are. Powerful tools may be badly misused either because they are complex – or used in situations where they should not be used. Technically savvy personnel (researchers, developers, etc.) have a strong tendency to minimize the difficulty of adoption and mastering new tools, and keeping a constant and strong connection with the majority of life sciences researchers requires a huge effort, often incompatible with the rapid development of technologies. Tools change and evolve, and if we do not have a mechanism for continuously keeping researchers well trained we will waste time and financial resources through the inefficiency of work performed without the benefit of the most up-to-date approaches.

There might be a never ending – Sisyphus aspect to this. Is this a perpetual task, where we are constantly climbing the steep hill of new tools and new ways of conducting research? To some extent, probably. Research is, by its very essence, driven by (and rewarded for) the production of new techniques. What is the best way to ensure that these new techniques diffuse rapidly and are use appropriately in research? Continuous Training.

The ReproNim answer to these challenges has been multi-faceted. It first comes with the realization that training must go beyond simply “how to use the tools”, since without the insight into the how and why these are constructed, how they operate, what are the dependencies, and how to completely document what has been done, one may be led to a “quick win” in running the tool but in a superficial way that may perpetuate a lack of transparency. Without training to understand the inner mechanisms or conceptual aspects of a tool or approach, there is little doubt that any knowledge gained will only be short term. Trainers will have lost time in presenting overly specific material, trainees will rapidly be stuck and left feeling powerless facing a long list of magical commands that do not build a coherent framework for practical problem solution. This problem has been famously illustrated by the “how to draw an owl” – acquiring in depth knowledge requires time (see Figure below, sourced from  Second, we also realized that training needs to be practical. Training only on theoretical and conceptual component would be useful, but clearly disconnected from the actual work and practices of our community. Often, neuroimagers are subjected to a high-level set of concepts, concepts that are hard to understand and hard to put into practice. The most efficient training in applied sciences is through practical and goal oriented work, which has to be hands on.


Combining these two constraints (depth and practicality) has largely defined the ReproNim training program. Measures of success are hard to define, but the training methods and material have been oriented to answer these complementary needs. These materials have, to date, been presented in two main formats.

Hands on workshops and online material

First, we worked to implement some aspects of the inverted classroom, where lectures are done ‘on-line’, and class time is spent solving problems (see, for example We started with on-line, comprehensive web based material (see that spans what we identified as the set of core components needed for reproducible neuroimaging research: 1) How to make data FAIR; 2) What are the basic computer literacy requirements; 3) How to build and use reproducible pipelines; and 4) What are the key statistical concepts for reliable results. Based on this code material (which could span weeks for each module) we established a series of ‘introductory workshops’. These 1-1.5 day workshops are designed to give trainees the practical knowledge and an environment that they can bring back to their labs. The format is close to the data ( and software ( carpentries, and brings some trainees to the level at which doing (and understanding) a ‘git pull’ request on the training material (all of which is based in and publicly accessible via GitHub) becomes a possible task. This is crucial, because we are then putting these training materials in the hands of a larger community, distributing the work of finding errors or unclear aspects to the user community, making ‘this’ training material ‘their’ training material. But because there isn’t enough time to actually teach all what is needed, we consider these workshops to be an illustration of the online material, to which trainees can go back to at their own pace. We are also finalizing a complete MOOC version of the materials using the Moodle platform, to help the reuse of this material in a more formal setting.

Train-the-trainer workshops: A pyramidal scheme.

Organizing teaching at workshops has been fruitful and rewarding. However, this solution does not scale: it would take too many resources to train the entire research community in this manner only. More recently, we have focused on organizing “train the trainer” workshops in partnership with the International Neuroinformatics Coordinating Facility (, where a small number of fellows were selected and invited to create a plan for their own training event, and hopefully create their own “train the trainer” workshop in their own institution settings. Having investigators skilled in these use of these tools, embedded in various labs around the world, will help in the adoption of these best-practices with the trainers themselves becoming the go-to person at each of these sites.


If there is one take home message, it would be that training is not a secondary aspect of reproducible research. If we are serious about changing the detailed practices of research, fostering the use of the optimum tools, developing the capacity to adapt to the evolving landscape, it must be at the heart of it to grow a community of a new type of researcher who will invest in long term and conceptual training to adapt rapidly and adopt the practices enabled by the new generations of tools, methods and software for reproducible and replicable research.