Tracking pipeline statuses

This guide shows how to track the completion status of BIDSification and processing pipelines.

BIDSification pipelines

The nipoppy track-curation command can be used to track dataset curation stages (reorganization and BIDSification). The command to create a curation status file from scratch is:

$ nipoppy track-curation --regenerate

Note

Without the --regenerate flag, nipoppy track-curation will only update the curation status for new participants in the manifest.

The above command creates or updates the curation status file at <NIPOPPY_PROJECT_ROOT>/sourcedata/imaging/curation_status.tsv. A summary of curation statuses can be displayed by running the nipoppy status command, which outputs a table with participant counts at different curation stages, like this:

      Participant counts by session at each Nipoppy checkpoint
             ╷             ╷              ╷               ╷
  session_id │ in_manifest │ in_pre_reorg │ in_post_reorg │ in_bids
 ════════════╪═════════════╪══════════════╪═══════════════╪═════════
      1      │      2      │      0       │       0       │    0
      2      │      2      │      0       │       0       │    0
             ╵             ╵              ╵               ╵

Note

The in_pre_reorg and in_post_reorg columns will be collapsed if all participants in the manifest have been BIDSified.

For each curation stage, the status is determined based on the presence of files in expected directories:

Column

Relevant directory

in_pre_reorg

<NIPOPPY_PROJECT_ROOT>/sourcedata/imaging/pre_reorg/<PARTICIPANT_ID>/<SESSION_ID> (configurable)

in_post_reorg

<NIPOPPY_PROJECT_ROOT>/sourcedata/imaging/post_reorg/sub-<PARTICIPANT_ID>/ses-<SESSION_ID>

in_bids

<NIPOPPY_PROJECT_ROOT>/bids/sub-<PARTICIPANT_ID>/ses-<SESSION_ID>

Processing pipelines

The nipoppy track-processing command can be used to track the completion status of processing pipelines. The minimal command is:

$ nipoppy track-processing --pipeline <PIPELINE_NAME>

Tip

The pipeline version and step name can be optionally specified using the --pipeline-version and --pipeline-step arguments respectively. By default, the latest version and the first step are used.

It is also possible to restrict the run to a single participant and/or session by using the --participant-id and --session-id arguments respectively.

The above command creates or updates the processing status file at <NIPOPPY_PROJECT_ROOT>/derivatives/processing_status.tsv. A summary of pipeline statuses can be displayed by running the nipoppy status command:

 Participant counts by session at each Nipoppy
                   checkpoint
             ╷             ╷         ╷
             │             │         │  mriqc
             │             │         │ 23.1.0
  session_id │ in_manifest │ in_bids │ default
 ════════════╪═════════════╪═════════╪═════════
      1      │      2      │    2    │    2
      2      │      2      │    2    │    2
             ╵             ╵         ╵

Tip

The processing status file can also be uploaded to https://digest.neurobagel.org for filtering and interactive visualizations.

Configuring a pipeline tracker

Pipeline completion criteria are defined through the tracker configuration file. The name of the tracker configuration file can be found in the pipeline’s config file at <NIPOPPY_PROJECT_ROOT>/pipelines/processing/<PIPELINE_NAME>-<PIPELINE_VERSION>/config.json; by default it is called tracker.json:

    "STEPS": [
        {
            "INVOCATION_FILE": "invocation.json",
            "DESCRIPTOR_FILE": "descriptor.json",
            "HPC_CONFIG_FILE": "hpc.json",
            "TRACKER_CONFIG_FILE": "tracker.json"
        }
    ],

Importantly, pipeline completion status is not inferred from exit codes, as trackers are run independently of the pipeline runners. Instead, the status is determined by checking for the presence of expected output files.

Here is example of tracker configuration file for the MRIQC pipeline, version 23.1.0:

{
    "PATHS": [
        "[[NIPOPPY_BIDS_PARTICIPANT_ID]]/[[NIPOPPY_BIDS_SESSION_ID]]/anat/[[NIPOPPY_BIDS_PARTICIPANT_ID]]_[[NIPOPPY_BIDS_SESSION_ID]]*_T1w.json",
        "[[NIPOPPY_BIDS_PARTICIPANT_ID]]_[[NIPOPPY_BIDS_SESSION_ID]]*_T1w.html"
    ]
}

These paths are expected to be relative to the <NIPOPPY_PROJECT_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output directory.

Tip

“Glob” expressions (i.e., that include *) are allowed in paths. If at least one file matches the expression, then the file will be considered found for that expression.

Note

The template strings [[NIPOPPY_<ATTRIBUTE_NAME>]] are replaced at runtime by appropriate values. Available template strings are:

  • [[NIPOPPY_PARTICIPANT_ID]]: the participant ID without the sub- prefix”,

  • [[NIPOPPY_SESSION_ID]]: the session ID without the ses- prefix”,

  • [[NIPOPPY_BIDS_PARTICIPANT_ID]]: the participant ID with the sub- prefix”,

  • [[NIPOPPY_BIDS_SESSION_ID]]: the session ID with the ses- prefix”,

Given a dataset with the following content in <NIPOPPY_PROJECT_ROOT>/derivatives/<PIPELINE_NAME>/<PIPELINE_VERSION>/output:

└── derivatives
    └── mriqc
        └── 23.1.0
            └── output
                ├── sub-001
                │   ├── figures
                │   │   ├── sub-001_ses-1_run-01_desc-background_T1w.svg
                │   │   └── sub-001_ses-1_run-01_desc-zoomed_T1w.svg
                │   └── ses-1
                │       └── anat
                │           ├── sub-001_ses-1_run-01_T1w.json
                │           └── sub-001_ses-1_run-02_T1w.json
                └── sub-001_ses-1_run-01_T1w.html

Running the tracker with the above configuration will result in the processing status file showing:

participant_id

bids_participant_id

session_id

bids_session_id

pipeline_name

pipeline_version

pipeline_step

status

001

sub-001

1

ses-1

mriqc

23.1.0

default

SUCCESS

Note

If there is an existing processing status file, the rows relevant to the specific pipeline, participants, and sessions will be updated. Other rows will be left as-is.

The pipeline_complete column can have the following values:

  • SUCCESS: all specified paths have been found

  • FAIL: at least one of the paths has not been found