BIDS and NIDM: Improving Imaging Data Sharing Together

BIDS (the Brain Imaging Data Structure) and NIDM (the NeuroImaging Data Model) both grew out of the INCF NeuroImaging DAtaSHaring Task Force (NI-DASH)), as parallel efforts to address different aspects of data sharing.  Specifically, the need for standards of organization and annotation.  A recent joint statement by these development groups has been released to better document the the synergy between these initiatives. In brief, these two initiatives can be summarized as follows:

BIDS is a standard that prescribes a formal convention for naming and organizing neuroimaging data and metadata in a file system that simplifies documentation, communication and collaboration between users, and enables easier data validation and software development through consistent paths and naming for data files.

NIDM is a Semantic Web-based metadata standard that helps capture and describe experimental data, analytic workflows and statistical results via the provenance of the data. NIDM uses consistent data descriptors in the form of machine accessible terminology, and a formal, extensible data model, which enables rich aggregation and query across datasets and domains.

BIDS has rapidly become a critically useful tool in neuroimaging data sharing by greatly reducing the barriers to documenting and sharing the imaging aspects of a study. Indeed, BIDS is specifically implicated in the ReproNim 5-Steps to More Reproducible Neuroimaging Research recommendations. However, many of the nuances that characterize much of the meaning of a specific dataset or its derivatives requires additional information in order to fully capture the specific meaning of the content. For example, the details of IQ as a measure of intelligence as may be reported in a BIDS ‘participants.tsv’ file can depend upon the way this data is collected and reported: is it a ‘full scale’, ‘performance’ or ‘verbal’ IQ? Semantic markup, as supported by the standard descriptors of the NIDM representation, helps to disambiguate measures through annotation of a measure (e.g. IQ, age) relative to the concept it represents, including documentation of the methods, units, ranges, etc. associated with that measure. As the semantics of a measure are equally important to understanding shared data as the format of the data representation, semantic annotation is also a key element of the ReproNim 5-Steps.

We at ReproNim resonate with the conclusion of the Joint BIDS-NIDM Statement in our support for the:

“…integrated use of both of these standards, (BIDS plus NIDM can be defined as a “SemanticBIDS” representation), in order to both maximize the ‘ease of [re]use’ and ‘ease of sharing’ of neuroimaging data in support of greater research transparency. The BIDS and NIDM development communities will continue to work together to build tools for further synergies between these initiatives.”

As such, ReproNim strives to increase the efficiency of neuroimaging tools that help drive the adoption of these best-practices of data sharing in support of our overarching goal of enhancing overall neuroimaging research reproducibility.

(Discover, Replicate, Innovate)Repeat

“Reproducible by Design”

Advertisements

What is Reproducibility

In the era of ‘questioning everything’ with respect to its impact on neuroimaging analysis reproducibility, we start with a set of petites histoires which take a look at the implications of various choices that researchers routinely make, and often take for granted.

First, let’s set the stage; while there are many definitions around the concept of ‘reproducibility’, I’m a bit partial to the one reflected in the following figure: ReproSpectrum

Here, we define a number of concepts that we will return to over and over again in the course of our stories:

  • Re-executability (publication-level replication): The exact same data, operated on by the exact same analysis should yield the exact same result. Current publications, in order to maintain readability, do not typically provide a complete specification of the exact analysis method or access to the exact data. Many published neuroimaging experiments are therefore not precisely re-executable. This is a problem for reproducibility.
  • Generalizability: We can divide generalizability into three variations:
    • Generalization Variation 1: Exact Same Data + Nominally ‘Similar’ Analyses should yield a ‘Similar’ Result (i.e. FreeSurfer subcortical volumes compared to FSL FIRST)
    • Generalization Variation 2: Nominally ‘Similar’ Data + Exact Same Analysis should yield a ‘Similar’ Result (i.e. the cohort of kids with autism I am using compared to the cohort you are using)
    • Generalized Reproducibility: Nominally ‘Similar’ Data + Nominally ‘Similar’ Analyses should yield a ‘Similar’ Result

We contend that ‘true findings’ in the neuroimaging literature should be able to achieve this ‘Generalized Reproducibility’ status in order to be valid claims.  As generalized reproducibility takes numerous claims and multiple publications in order to be established, most publications, themselves, are reporting what I would call ‘proto-claims’. These proto-claims may, or may not, end up being generalized. Since in our publications we do not really characterize data, analysis, and results very exhaustively, this lack of provenance permits the concept of ‘similar’ to have lots of wiggle room for interpretation (either to enhance similarity or to highlight differences, as desired by the interests of the author). In addition, we, as a community, tend to treat these individual reports, or proto-claims, as if they are established scientific claims (generalizably reproducible), since we do not really have any proper ‘system’ (apart from our own reading of the literature) to track the evolution of a claim.

While much of the work of ReproNim is to help establish easy-to-use end-user tools to exhaustively characterize data, analysis, and results (in order to enhance the community’s ability to explore the ‘reproducibility landscape’ of any given publication and its claims), it is equally important to work on the claims identification and tracking problem so that we can detect when our more ‘reproducible and re-executable’ procedures have established the ‘generalized reproducibility’ of a specific finding.  Our next petites histoires will delve more deeply into the details of neuroimaging analysis ecosystem.

 

Introducing the ReproNim Blog

ReproNim is a Center for Reproducible Neuroimaging Computation. As a NIH/NIBIB Biomedical Technology Research Center (BTRC) P41, ReproNim seeks to solve the ‘last mile’ problem for actual utilization of the myriad neuroinformatics resources that have been developed, but not routinely used, in support of the publications of more reproducible neuroimaging science. More details for the overall program can be found at our website: ReproNim website.

With this blog, the ReproNim team hopes to bring ‘little stories’ (les petites histoires) to our readers that highlight issues and solutions in the ongoing quest for enhanced reproducibility in neuroimaging. Feel free to comment, contact us (email: info@reproducibleimaging.org), or otherwise engage with the effort to:

(Discover, Replicate, Innovate)Repeat