ReproNim’s goal is to improve the reproducibility of neuroimaging science, while making the process of capturing and precisely describing all essential/necessary experimental details both easier and more efficient for investigators. Supporting re-executability is a challenging and multifaceted problem. In the service of this goal, ReproNim has taken a multi-pronged approach to developing our reproducible analysis framework, including development and delivery of software tools and training materials to facilitate and support re-executability practices. These tools are designed to make it easier (user-friendly) and more efficient for investigators to work with current technology to comprehensively describe, record, share and later find all the necessary experimental details of any given study as needed. Here we provide a high-level vision of the anticipated data flow in at FAIR image analysis framework.
ReproSystem: An Operational Framework for Reproducible Analysis
Components of the reproducible analysis framework include tools for data and software discovery, as well as tools for implementation of standardized description of data, results and workflows, and resources to enable varied execution options that facilitate re-executable analyses in all computational environments. Part of the framework is to capture a comprehensive machine-readable description of metadata associated with the tools and data that can be stored (and published and retrieved) in both the local/private (ReproPond) and shared/public (ReproLake) settings. For the sake of this discussion, we define ‘data’ as the imaging data as acquired from the scanner (DICOMs or NIfTIs, for example) and the metadata is the data about this imaging data (acquisition parameters, subject characteristics, etc.).
As an example of this framework, imagine that an investigator, Alice, has done a study in a new cohort to understand the effects of early mild cognitive impairment on cortical thickness. The metadata about this study (analysis details, results, etc.) reside locally with Alice in her ReproPond. Alice then looks for other publicly available (published) similar studies (in the ReproLake) and finds that another investigator, Bob, has performed some related analysis. Alice can now compare or integrate these results together. When Alice publishes her study, and adds her own metadata to the ReproLake, other investigators will be better able to discover and work with her results. This increases the citation and impact of her original study and places it in the context of other publications.
As outlined in the accompanying figure, the ReproSystem framework takes the full experimental cycle of neuroimaging data acquisition, analyses, and publication into account. This system explicitly includes both data (any of various types, such as scanner acquisition, BIDS data, behavioral data, output of prior analysis) entering into an analysis, and the metadata+provenance (i.e. software versions and computer operating system versions used, processing script, subject demographics, results description, etc.) of the analysis. The metadata maintains a link to its associated data permitting flexibility regarding how each of these types of information sources can be used.
Conceptually, this system starts with some kind of data (lower left corner), which may be generated anew locally within the experimenter’s lab, or it may be data pulled from a private (e.g. lab) or public repositories, (such as OpenNeuro) to investigate a research question. Regardless of the specific source of data, it will ultimately be transformed by some series of analyses executed in the investigator’s specific computational environment. Each analytic step involves some data, as well as resultant derived data, all of which is stored locally (in a Local Data Store). Importantly, each analytic step also generates metadata and provenance information linked to the data, workflow (analysis scripts, for example), computational and software environments of the analysis, and resultant derived data – all of which are stored in a Local Metadata Store (aka ReproPond). Any or all of these data or metadata and provenance elements can be ‘published’ and made public such that data can promoted to a Public Data Repository, while metadata can be promoted to a Public Metadata Store (aka ReproNim’s ReproLake). Either one or both of these published data and metadata elements can included as research objects that are part of a re-executable publication (aka ReproPub).
Efforts continue at ReproNim to actualize all elements of this vision, while attempting to leave the actual data analytic process feeling as similar as possible to that to which researchers have become accustomed. But under the hood of your favorite tool and your local data lives a thriving ecosystem, teeming with metadata, there at your disposal to support your complete scientific publication needs.