ReproNim Training: Some Lessons Learnt

While the fields of life sciences, and neuroimaging in particular, are struggling with the apparent reproducibility crisis, the community is honing its skills, developing tools and best practices to foster more replicable and reproducible studies. Tools are key in this respect – without new practical and efficient tooling to design studies, analyze data, and verify results, research will keep moving too slowly (due to potential uncertainty and ambiguity in its results) in its strive to establish a fundamental understanding of the neurobiology of health and disease and hence not succeed in responding appropriately or maximally effectively to the needs of the populations suffering from brain diseases. Tools that provide ‘provenance aware’ (capturing exactly how each step was carried out) support  for all phases of the analysis workflow (pipeline systems for analysis, better handling mechanisms for data and software (i.e. containers), better ways to harmonize and track provenance) are dearly needed.

But for many of us, the fundamental work has to be in the training of the research community. First, tools (software products and libraries) are only as good as their users are. Powerful tools may be badly misused either because they are complex – or used in situations where they should not be used. Technically savvy personnel (researchers, developers, etc.) have a strong tendency to minimize the difficulty of adoption and mastering new tools, and keeping a constant and strong connection with the majority of life sciences researchers requires a huge effort, often incompatible with the rapid development of technologies. Tools change and evolve, and if we do not have a mechanism for continuously keeping researchers well trained we will waste time and financial resources through the inefficiency of work performed without the benefit of the most up-to-date approaches.

There might be a never ending – Sisyphus aspect to this. Is this a perpetual task, where we are constantly climbing the steep hill of new tools and new ways of conducting research? To some extent, probably. Research is, by its very essence, driven by (and rewarded for) the production of new techniques. What is the best way to ensure that these new techniques diffuse rapidly and are use appropriately in research? Continuous Training.

The ReproNim answer to these challenges has been multi-faceted. It first comes with the realization that training must go beyond simply “how to use the tools”, since without the insight into the how and why these are constructed, how they operate, what are the dependencies, and how to completely document what has been done, one may be led to a “quick win” in running the tool but in a superficial way that may perpetuate a lack of transparency. Without training to understand the inner mechanisms or conceptual aspects of a tool or approach, there is little doubt that any knowledge gained will only be short term. Trainers will have lost time in presenting overly specific material, trainees will rapidly be stuck and left feeling powerless facing a long list of magical commands that do not build a coherent framework for practical problem solution. This problem has been famously illustrated by the “how to draw an owl” – acquiring in depth knowledge requires time (see Figure below, sourced from https://cryptogenomicon.org/2016/09/08/from-zero-to-python/).  Second, we also realized that training needs to be practical. Training only on theoretical and conceptual component would be useful, but clearly disconnected from the actual work and practices of our community. Often, neuroimagers are subjected to a high-level set of concepts, concepts that are hard to understand and hard to put into practice. The most efficient training in applied sciences is through practical and goal oriented work, which has to be hands on.

owl

Combining these two constraints (depth and practicality) has largely defined the ReproNim training program. Measures of success are hard to define, but the training methods and material have been oriented to answer these complementary needs. These materials have, to date, been presented in two main formats.

Hands on workshops and online material

First, we worked to implement some aspects of the inverted classroom, where lectures are done ‘on-line’, and class time is spent solving problems (see, for example https://ii.library.jhu.edu/tag/inverted-classroom/). We started with on-line, comprehensive web based material (see https://www.repronim.org/teach.html) that spans what we identified as the set of core components needed for reproducible neuroimaging research: 1) How to make data FAIR; 2) What are the basic computer literacy requirements; 3) How to build and use reproducible pipelines; and 4) What are the key statistical concepts for reliable results. Based on this code material (which could span weeks for each module) we established a series of ‘introductory workshops’. These 1-1.5 day workshops are designed to give trainees the practical knowledge and an environment that they can bring back to their labs. The format is close to the data (https://datacarpentry.org/) and software (https://software-carpentry.org/) carpentries, and brings some trainees to the level at which doing (and understanding) a ‘git pull’ request on the training material (all of which is based in and publicly accessible via GitHub) becomes a possible task. This is crucial, because we are then putting these training materials in the hands of a larger community, distributing the work of finding errors or unclear aspects to the user community, making ‘this’ training material ‘their’ training material. But because there isn’t enough time to actually teach all what is needed, we consider these workshops to be an illustration of the online material, to which trainees can go back to at their own pace. We are also finalizing a complete MOOC version of the materials using the Moodle platform, to help the reuse of this material in a more formal setting.

Train-the-trainer workshops: A pyramidal scheme.

Organizing teaching at workshops has been fruitful and rewarding. However, this solution does not scale: it would take too many resources to train the entire research community in this manner only. More recently, we have focused on organizing “train the trainer” workshops in partnership with the International Neuroinformatics Coordinating Facility (incf.org), where a small number of fellows were selected and invited to create a plan for their own training event, and hopefully create their own “train the trainer” workshop in their own institution settings. Having investigators skilled in these use of these tools, embedded in various labs around the world, will help in the adoption of these best-practices with the trainers themselves becoming the go-to person at each of these sites.

Summary

If there is one take home message, it would be that training is not a secondary aspect of reproducible research. If we are serious about changing the detailed practices of research, fostering the use of the optimum tools, developing the capacity to adapt to the evolving landscape, it must be at the heart of it to grow a community of a new type of researcher who will invest in long term and conceptual training to adapt rapidly and adopt the practices enabled by the new generations of tools, methods and software for reproducible and replicable research.